[ 
https://issues.apache.org/jira/browse/HBASE-48?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12566400#action_12566400
 ] 

stack commented on HBASE-48:
----------------------------

If a new table, make a region per reducer (Configure many reducers if table is 
big).  The framework will have done the sorting (lexigraphically if thats our 
key compare function) for us  (Might have to add to the framework to ensure we 
don't split a key in the middle of a row).

If a table that already exists, would be reducer per existing region and yeah, 
there'll be a splitting and compacting price to pay.

To see difference in speeds going via API versus writing direct to mapfiles, 
see primitive PerformanceEvaluation and compare numbers writing mapfiles 
directly rather than going API.

> [hbase] Bulk load and dump tools
> --------------------------------
>
>                 Key: HBASE-48
>                 URL: https://issues.apache.org/jira/browse/HBASE-48
>             Project: Hadoop HBase
>          Issue Type: New Feature
>            Reporter: stack
>            Priority: Minor
>
> Hbase needs tools to facilitate bulk upload and possibly dumping.  Going via 
> the current APIs, particularly if the dataset is large and cell content is 
> small, uploads can take a long time even when using many concurrent clients.
> PNUTS folks talked of need for a different API to manage bulk upload/dump.
> Another notion would be to somehow have the bulk loader tools somehow write 
> regions directly in hdfs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to