[ 
https://issues.apache.org/jira/browse/HBASE-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895902#action_12895902
 ] 

Lars Francke commented on HBASE-1861:
-------------------------------------

I have taken a stab at it. This is what I did:

* Currently once it is decided that a HFile becomes too large it is closed and 
a new one is open. This doesn't work anymore because there may still be 
KeyValues for the current row in other column families coming. So now I just 
set a flag that a HFile rotation is needed. On every write this flag is tested 
and when it is true and the row key changes I close all currently open HFiles
** This gets slightly more complicated due to the fact that we only _close_ the 
HFiles but don't open new ones here because they may not be needed. So a check 
is still required on every write if we need to open a new HFile
* As we later need to know which files belong together to a region I save them 
using the current task attempt id and a counter to guarantee their uniqueness 

The current tests all run with my changes which is a good sign.

The second part is the loading of those files which seems to be more 
complicated and which could use some comments. HBASE-1923 recently made this 
more complicated and I'm not sure I fully understand. Basically these are the 
changes required:

* To create a new region we now have to look for the start- and endkey in all 
column families
* We have to load all the column families HFiles for a single region, those 
might be different between regions

To make both steps easier I could write an additional metadata file during 
HFileOutputFormat which contains the start- and endkeys as well as all the 
column families that have HFiles for this region. This data is available during 
creation.

So any input on how this would affect/be affected by the incremental stuff 
would be appreciated.

> Multi-Family support for bulk upload tools (HFileOutputFormat / loadtable.rb)
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-1861
>                 URL: https://issues.apache.org/jira/browse/HBASE-1861
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce
>    Affects Versions: 0.20.0
>            Reporter: Jonathan Gray
>             Fix For: 0.90.0
>
>
> Add multi-family support to bulk upload tools from HBASE-48.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to