The last time I seriously looked at this, it was to answer serious performance issues with HBase. I eventually fixed said performance issues, and thus went on to drop the idea overall.
-ryan On Mon, Jul 27, 2009 at 1:52 PM, stack<[email protected]> wrote: > Latest thinking is write a MR job that in the reducer writes hfiles that are > just under a region size (<256M). When reducer has reached about 240MB, it > opens new file. (May need to write custom ReduceRunner to keep account of > whats been written and to rotate the file). > > After the MR has finished, a script would come along, move the hfiles into > appropriate directory structure. Each hfile would be the sole content of > the region. The script would read from each hfile's metadata its first and > last keys and then using this metainfo along with a table format specified > externally, insert an entry into .META. per region (See the scripts in bin > -- copy and rename table -- for examples of how to manipulate .META.). > > Someone needs to just do it. We've been talking about it for ever. > > St.Ack > P.S. Here is older thinking on the topic > https://issues.apache.org/jira/browse/HBASE-48 > > On Mon, Jul 27, 2009 at 1:31 PM, tim robertson > <[email protected]>wrote: > >> Hi all, >> >> Ryan wrote on a different thread: >> >> "It should be possible to randomly insert data from a pre-existing >> data set. There is some work to directly import straight into hfiles >> and skipping the regionserver, but that would only really work on 1 >> time imports to new tables." >> >> Could someone please elaborate on this a little and outline the steps >> needed? Do you write an hfile in a custom mapreduce output format and >> then somehow write the table metadata file afterwards? >> >> Cheers, >> >> Tim >> >
