Yeah, I considered working on this -- but we can import our entire production DB in just a few hours, and then it's all incremental from there. So bulk insert isn't a huge use case for us.
On Tue, Jul 28, 2009 at 9:19 AM, Jonathan Gray<[email protected]> wrote: > Though HBase imports are fairly fast, they would probably be 5-10x faster > with a straight-to-hfile import method. > > Once we get 0.20.0 shipped, we should have more time to spend on actually > implementing this. Though anyone is welcome to take a shot. Stack described > it well. > > JG > > Ryan Rawson wrote: >> >> The last time I seriously looked at this, it was to answer serious >> performance issues with HBase. I eventually fixed said performance >> issues, and thus went on to drop the idea overall. >> >> -ryan >> >> On Mon, Jul 27, 2009 at 1:52 PM, stack<[email protected]> wrote: >>> >>> Latest thinking is write a MR job that in the reducer writes hfiles that >>> are >>> just under a region size (<256M). When reducer has reached about 240MB, >>> it >>> opens new file. (May need to write custom ReduceRunner to keep account >>> of >>> whats been written and to rotate the file). >>> >>> After the MR has finished, a script would come along, move the hfiles >>> into >>> appropriate directory structure. Each hfile would be the sole content of >>> the region. The script would read from each hfile's metadata its first >>> and >>> last keys and then using this metainfo along with a table format >>> specified >>> externally, insert an entry into .META. per region (See the scripts in >>> bin >>> -- copy and rename table -- for examples of how to manipulate .META.). >>> >>> Someone needs to just do it. We've been talking about it for ever. >>> >>> St.Ack >>> P.S. Here is older thinking on the topic >>> https://issues.apache.org/jira/browse/HBASE-48 >>> >>> On Mon, Jul 27, 2009 at 1:31 PM, tim robertson >>> <[email protected]>wrote: >>> >>>> Hi all, >>>> >>>> Ryan wrote on a different thread: >>>> >>>> "It should be possible to randomly insert data from a pre-existing >>>> data set. There is some work to directly import straight into hfiles >>>> and skipping the regionserver, but that would only really work on 1 >>>> time imports to new tables." >>>> >>>> Could someone please elaborate on this a little and outline the steps >>>> needed? Do you write an hfile in a custom mapreduce output format and >>>> then somehow write the table metadata file afterwards? >>>> >>>> Cheers, >>>> >>>> Tim >>>> >> > -- http://www.roadtofailure.com -- The Fringes of Scalability, Social Media, and Computer Science
