[ 
https://issues.apache.org/jira/browse/HADOOP-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12549661
 ] 

Bryan Duxbury commented on HADOOP-2075:
---------------------------------------

A really cool feature for bulk loading would be artificially lowering the split 
size so that splits occur really often, at least until there are as many 
regions as there are regionservers. That way, the load operation could have a 
lot more parallelism early on. 

> [hbase] Bulk load and dump tools
> --------------------------------
>
>                 Key: HADOOP-2075
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2075
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/hbase
>            Reporter: stack
>            Priority: Minor
>
> Hbase needs tools to facilitate bulk upload and possibly dumping.  Going via 
> the current APIs, particularly if the dataset is large and cell content is 
> small, uploads can take a long time even when using many concurrent clients.
> PNUTS folks talked of need for a different API to manage bulk upload/dump.
> Another notion would be to somehow have the bulk loader tools somehow write 
> regions directly in hdfs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to