[
https://issues.apache.org/jira/browse/HBASE-48?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12613402#action_12613402
]
Jean-Daniel Cryans commented on HBASE-48:
-----------------------------------------
Something HBase should have is a BatchUpdate that takes multiple row keys. A
simple version of it would be doing many BatchUpdate like we already have but
in an iteration. An enhanced version would instead do something like this when
there is only a few regions :
- Sort the row keys
- Sample some rows to get an average row size
- Using the existing region(s) with the row keys to insert and the average row
size, figure how the splits would be done
- Insert the missing rows that would be the new lows and highs
- Force the desired splits
- Insert remaining data
> [hbase] Bulk load and dump tools
> --------------------------------
>
> Key: HBASE-48
> URL: https://issues.apache.org/jira/browse/HBASE-48
> Project: Hadoop HBase
> Issue Type: New Feature
> Reporter: stack
> Priority: Minor
>
> Hbase needs tools to facilitate bulk upload and possibly dumping. Going via
> the current APIs, particularly if the dataset is large and cell content is
> small, uploads can take a long time even when using many concurrent clients.
> PNUTS folks talked of need for a different API to manage bulk upload/dump.
> Another notion would be to somehow have the bulk loader tools somehow write
> regions directly in hdfs.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.