Re: Items to contribute (plan)

Ryan Rawson Sat, 22 Jan 2011 17:49:00 -0800

Hopefully to do #1, you would not require many/any changes in HFile or
HBase.  Implementing the HDFS stream API should be enough.


#2 is interesting, what is the benefit?  How did you measure said benefit?

-ryan

On Sat, Jan 22, 2011 at 5:45 PM, Ted Yu <[email protected]> wrote:
> #1 looks similar to what MapR has done.
>
> On Sat, Jan 22, 2011 at 5:18 PM, Tatsuya Kawano <[email protected]>wrote:
>
>>
>> Hi,
>>
>> I wanted to let you know that I'm planning to contribute the following
>> items to the HBase community. These are my spare time projects and I'll only
>> be able to spend my time about 7 hours a week, so the progress will be very
>> slow. I want some feedback from you guys to prioritize them. Also, if
>> someone/team wants to work on them (with me or alone), I'll be happy to
>> provide more details.
>>
>>
>> 1. RADOS integration
>>
>> Run HBase not only on HDFS but also RADOS distributed object store (the
>> lower layer of Ceph), so that the following options will become available to
>> HBase users:
>>
>> -- No SPOF (RADOS doesn't have the name node(s), but only ZK-like monitors
>> and data nodes)
>> -- Instant backup of HBase tables (RADOS provides copy-on-write snapshot
>> per object pool)
>> -- Extra durability option on WAL (RADOS can do both synchronous and
>> asynchronous disk flush. HDFS doesn't have the earlier option)
>>
>> Note:
>> RADOS object = HFile, WAL
>> object pool = group of HFiles or WAL
>>
>> Current status: Design phase
>>
>>
>> 2. mapreduce.HFileInputFormat
>>
>> MR library to read data directly from HFiles. (Roughly 2.5 times faster
>> than TableInputFormat in my tests)
>>
>> Current status: Completed a proof-of-concept prototype and measured
>> performance.
>>
>>
>> 3. Enhance Get/Scan performance of RS
>>
>> Add an hash code and a couple of flags to HFile at the flush time and
>> change scanner implementation so that:
>>
>> -- Get/Scan operations will get faster. (less key comparisons for
>> reconstructing a row: O(h * c) -> O(h).  [h = number of HFiles for the row,
>> c = number of columns in an HFile])
>> -- The size of HFiles will become a bit smaller. (The flags will eliminate
>> duplicate bytes in keys (row, column family and qualifier) from HFiles.)
>>
>> Current status: Completed a proof-of-concept prototype and measured
>> performance.
>>
>> Detals:
>> https://github.com/tatsuya6502/hbase-mr-pof/
>> (I meant "poc" not "pof"...)
>>
>>
>> 4. Writing Japanese books and documents
>>
>> -- Currently I'm authoring a book chapter about HBase for a Japanese NOSQL
>> book
>> -- I'll translate The Apache HBase Book to Japanese
>>
>>
>> Thank you,
>>
>>
>> --
>> Tatsuya Kawano (Mr.)
>> Tokyo, Japan
>>
>> http://twitter.com/#!/tatsuya6502 <http://twitter.com/#%21/tatsuya6502>
>>
>>
>>
>

Re: Items to contribute (plan)

Reply via email to