[
https://issues.apache.org/jira/browse/HBASE-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895902#action_12895902
]
Lars Francke commented on HBASE-1861:
-------------------------------------
I have taken a stab at it. This is what I did:
* Currently once it is decided that a HFile becomes too large it is closed and
a new one is open. This doesn't work anymore because there may still be
KeyValues for the current row in other column families coming. So now I just
set a flag that a HFile rotation is needed. On every write this flag is tested
and when it is true and the row key changes I close all currently open HFiles
** This gets slightly more complicated due to the fact that we only _close_ the
HFiles but don't open new ones here because they may not be needed. So a check
is still required on every write if we need to open a new HFile
* As we later need to know which files belong together to a region I save them
using the current task attempt id and a counter to guarantee their uniqueness
The current tests all run with my changes which is a good sign.
The second part is the loading of those files which seems to be more
complicated and which could use some comments. HBASE-1923 recently made this
more complicated and I'm not sure I fully understand. Basically these are the
changes required:
* To create a new region we now have to look for the start- and endkey in all
column families
* We have to load all the column families HFiles for a single region, those
might be different between regions
To make both steps easier I could write an additional metadata file during
HFileOutputFormat which contains the start- and endkeys as well as all the
column families that have HFiles for this region. This data is available during
creation.
So any input on how this would affect/be affected by the incremental stuff
would be appreciated.
> Multi-Family support for bulk upload tools (HFileOutputFormat / loadtable.rb)
> -----------------------------------------------------------------------------
>
> Key: HBASE-1861
> URL: https://issues.apache.org/jira/browse/HBASE-1861
> Project: HBase
> Issue Type: Improvement
> Components: mapreduce
> Affects Versions: 0.20.0
> Reporter: Jonathan Gray
> Fix For: 0.90.0
>
>
> Add multi-family support to bulk upload tools from HBASE-48.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.