[ 
https://issues.apache.org/jira/browse/HBASE-48?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12613402#action_12613402
 ] 

Jean-Daniel Cryans commented on HBASE-48:
-----------------------------------------

Something HBase should have is a BatchUpdate that takes multiple row keys. A 
simple version of it would be doing many BatchUpdate like we already have but 
in an iteration. An enhanced version would instead do something like this when 
there is only a few regions :

 - Sort the row keys
 - Sample some rows to get an average row size
 - Using the existing region(s) with the row keys to insert and the average row 
size, figure how the splits would be done
 - Insert the missing rows that would be the new lows and highs
 - Force the desired splits
 - Insert remaining data

> [hbase] Bulk load and dump tools
> --------------------------------
>
>                 Key: HBASE-48
>                 URL: https://issues.apache.org/jira/browse/HBASE-48
>             Project: Hadoop HBase
>          Issue Type: New Feature
>            Reporter: stack
>            Priority: Minor
>
> Hbase needs tools to facilitate bulk upload and possibly dumping.  Going via 
> the current APIs, particularly if the dataset is large and cell content is 
> small, uploads can take a long time even when using many concurrent clients.
> PNUTS folks talked of need for a different API to manage bulk upload/dump.
> Another notion would be to somehow have the bulk loader tools somehow write 
> regions directly in hdfs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to