Re: Fast importing into HBase (bypassing RegionServer)

Ryan Rawson Mon, 27 Jul 2009 13:57:10 -0700

The last time I seriously looked at this, it was to answer serious
performance issues with HBase.  I eventually fixed said performance
issues, and thus went on to drop the idea overall.


-ryan

On Mon, Jul 27, 2009 at 1:52 PM, stack<[email protected]> wrote:
> Latest thinking is write a MR job that in the reducer writes hfiles that are
> just under a region size (<256M).  When reducer has reached about 240MB, it
> opens new file.  (May need to write custom ReduceRunner to keep account of
> whats been written and to rotate the file).
>
> After the MR has finished, a script would come along, move the hfiles into
> appropriate directory structure.  Each hfile would be the sole content of
> the region.  The script would read from each hfile's metadata its first and
> last keys and then using this metainfo along with a table format specified
> externally, insert an entry into .META. per region (See the scripts in bin
> -- copy and rename table -- for examples of how to manipulate .META.).
>
> Someone needs to just do it.  We've been talking about it for ever.
>
> St.Ack
> P.S. Here is older thinking on the topic
> https://issues.apache.org/jira/browse/HBASE-48
>
> On Mon, Jul 27, 2009 at 1:31 PM, tim robertson 
> <[email protected]>wrote:
>
>> Hi all,
>>
>> Ryan wrote on a different thread:
>>
>> "It should be possible to randomly insert data from a pre-existing
>> data set.  There is some work to directly import straight into hfiles
>> and skipping the regionserver, but that would only really work on 1
>> time imports to new tables."
>>
>> Could someone please elaborate on this a little and outline the steps
>> needed?  Do you write an hfile in a custom mapreduce output format and
>> then somehow write the table metadata file afterwards?
>>
>> Cheers,
>>
>> Tim
>>
>

Re: Fast importing into HBase (bypassing RegionServer)

Reply via email to