[ 
https://issues.apache.org/jira/browse/HBASE-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896082#action_12896082
 ] 

Lars Francke commented on HBASE-1861:
-------------------------------------

Okay after having talked with Lars George and looking over the incremental load 
stuff from Todd I've got even more questions.
It seems as if - and we should really document this somewhere - there are now 
two distinct ways to bulk load stuff into HBase:

loadtable.rb creates regions manually and just creates the metadata to be 
picked up by the metascanner. This seems like it is not very resource intensive 
(after the HFiles have been generated).

And then there's the new completebulkload tool which shifts some of the load to 
HBase itself by (and please correct me if I understood this wrong) possibly 
splitting a lot of the existing regions and basically depending on HBase to put 
HFiles in appropriate places. This is a great solution for incremental loads as 
regions already exist. But is this a good solution performance/load wise for an 
empty table? My knowledge of HBase in this regard is still limited but I would 
have thought that the constant splitting would be pretty bad especially when 
starting with an empty table with no regions.

I'd love your input on how to solve this: multi column families only for empty 
tables supported by loadtable.rb or only for the incremental bulk load tool or 
for both?
This also includes the question if we should keep loadtable.rb if it is a 
better fit for "cold imports".

> Multi-Family support for bulk upload tools (HFileOutputFormat / loadtable.rb)
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-1861
>                 URL: https://issues.apache.org/jira/browse/HBASE-1861
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce
>    Affects Versions: 0.20.0
>            Reporter: Jonathan Gray
>             Fix For: 0.90.0
>
>
> Add multi-family support to bulk upload tools from HBASE-48.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to