[ 
https://issues.apache.org/jira/browse/HBASE-48?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12566338#action_12566338
 ] 

Billy Pearson commented on HBASE-48:
------------------------------------

Would not the best way to do this would be to do a map that formats and sorts 
the data per column family then a reduce that writes a mapfiles directly to the 
regions columns?

Then that would skip the api and speed up the loading of the data and it would 
not matter so much if we has 1 region or not sense all we would be doing is 
adding a mapfile to hdfs.
Course the map would have to know if there is 1 region or 1000 and split the 
data correctly but even if each map 
only produces a few lines of data per column family the compactor will come 
along sooner or later and clean up and split where needed.

So if we add 100 map files to one column I would assume that it would slow 
reads down a little bit havening to sort threw all the map files while scanning 
but that would be a temporary speed problem.


> [hbase] Bulk load and dump tools
> --------------------------------
>
>                 Key: HBASE-48
>                 URL: https://issues.apache.org/jira/browse/HBASE-48
>             Project: Hadoop HBase
>          Issue Type: New Feature
>            Reporter: stack
>            Priority: Minor
>
> Hbase needs tools to facilitate bulk upload and possibly dumping.  Going via 
> the current APIs, particularly if the dataset is large and cell content is 
> small, uploads can take a long time even when using many concurrent clients.
> PNUTS folks talked of need for a different API to manage bulk upload/dump.
> Another notion would be to somehow have the bulk loader tools somehow write 
> regions directly in hdfs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to