[jira] Commented: (HBASE-48) [hbase] Bulk load and dump tools

stack (JIRA) Sat, 01 Aug 2009 18:48:38 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-48?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12737973#action_12737973
 ]


stack commented on HBASE-48:
----------------------------

To bulk upload into an already floating, loaded table, we'd write a partitioner 
that read the list of regions from the table.  Then reducers would write files 
whose key span fit an extant region.  Files could go into column families that 
already were in existance -- the uploader would just ensure the new files had 
sequenceid in advance of any files already in place -- or you could write a new 
family.

Table should be quiescent while upload is going on so no flushing at same time. 
  When done, take table offline and then online it again and the new files'll 
be picked up (If loaded into new column family, you'd need to add new family 
while offlined).

> [hbase] Bulk load and dump tools
> --------------------------------
>
>                 Key: HBASE-48
>                 URL: https://issues.apache.org/jira/browse/HBASE-48
>             Project: Hadoop HBase
>          Issue Type: New Feature
>            Reporter: stack
>            Priority: Minor
>         Attachments: 48-v2.patch, 48-v3.patch, 48-v4.patch, 48.patch, 
> loadtable.rb
>
>
> Hbase needs tools to facilitate bulk upload and possibly dumping.  Going via 
> the current APIs, particularly if the dataset is large and cell content is 
> small, uploads can take a long time even when using many concurrent clients.
> PNUTS folks talked of need for a different API to manage bulk upload/dump.
> Another notion would be to somehow have the bulk loader tools somehow write 
> regions directly in hdfs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-48) [hbase] Bulk load and dump tools

Reply via email to