Hopefully to do #1, you would not require many/any changes in HFile or HBase. Implementing the HDFS stream API should be enough.
#2 is interesting, what is the benefit? How did you measure said benefit? -ryan On Sat, Jan 22, 2011 at 5:45 PM, Ted Yu <[email protected]> wrote: > #1 looks similar to what MapR has done. > > On Sat, Jan 22, 2011 at 5:18 PM, Tatsuya Kawano <[email protected]>wrote: > >> >> Hi, >> >> I wanted to let you know that I'm planning to contribute the following >> items to the HBase community. These are my spare time projects and I'll only >> be able to spend my time about 7 hours a week, so the progress will be very >> slow. I want some feedback from you guys to prioritize them. Also, if >> someone/team wants to work on them (with me or alone), I'll be happy to >> provide more details. >> >> >> 1. RADOS integration >> >> Run HBase not only on HDFS but also RADOS distributed object store (the >> lower layer of Ceph), so that the following options will become available to >> HBase users: >> >> -- No SPOF (RADOS doesn't have the name node(s), but only ZK-like monitors >> and data nodes) >> -- Instant backup of HBase tables (RADOS provides copy-on-write snapshot >> per object pool) >> -- Extra durability option on WAL (RADOS can do both synchronous and >> asynchronous disk flush. HDFS doesn't have the earlier option) >> >> Note: >> RADOS object = HFile, WAL >> object pool = group of HFiles or WAL >> >> Current status: Design phase >> >> >> 2. mapreduce.HFileInputFormat >> >> MR library to read data directly from HFiles. (Roughly 2.5 times faster >> than TableInputFormat in my tests) >> >> Current status: Completed a proof-of-concept prototype and measured >> performance. >> >> >> 3. Enhance Get/Scan performance of RS >> >> Add an hash code and a couple of flags to HFile at the flush time and >> change scanner implementation so that: >> >> -- Get/Scan operations will get faster. (less key comparisons for >> reconstructing a row: O(h * c) -> O(h). [h = number of HFiles for the row, >> c = number of columns in an HFile]) >> -- The size of HFiles will become a bit smaller. (The flags will eliminate >> duplicate bytes in keys (row, column family and qualifier) from HFiles.) >> >> Current status: Completed a proof-of-concept prototype and measured >> performance. >> >> Detals: >> https://github.com/tatsuya6502/hbase-mr-pof/ >> (I meant "poc" not "pof"...) >> >> >> 4. Writing Japanese books and documents >> >> -- Currently I'm authoring a book chapter about HBase for a Japanese NOSQL >> book >> -- I'll translate The Apache HBase Book to Japanese >> >> >> Thank you, >> >> >> -- >> Tatsuya Kawano (Mr.) >> Tokyo, Japan >> >> http://twitter.com/#!/tatsuya6502 <http://twitter.com/#%21/tatsuya6502> >> >> >> >
