Hi folks, The Medium Object (MOB) Storage feature (HBASE-11339[1]) is modified I/O and compaction path that allows individual moderately sized values (10k-10MB) to be stored so that write amplification is reduced when compared to the normal I/O path. At a high level, it provides alternate flush and compaction mechanisms that segregates large cells into a separate area where they are not subject to potentially frequent compaction and splits that can be encountered in the normal I/O path. A more detailed design doc can be found on the hbase-11339 jira.
Jingcheng Du has been working on the mob feature for a while and Anoop, Ram and I have been shepherding him through the design revisions and implementation of the feature in the hbase-11339 branch.[2] The branch we are proposing to merge into master is compatible with HBase's core functionality including snapshots, replication, shell support, behaves well with table alters, bulk loads and does not require external MR processes. It has been documented, and subject to many integration test runs (ITBLL, ITAcidGuarantees, ITIngest) including fault injection. Performance testing of the feature shows what can be a 2x-3x throughput improvement for workloads that contain mobs. These results can be seen on the hbase 2.0 panel discussion slides from hbasecon (once published). Recently there have been some hfile encryption related shortcomings that we could address in branch or in master. Earlier iterations of the feature has been tested in production by users that Jingcheng has been responsible for. A version has also been deployed at users I have been responsible for. Some of the folks from Huawei (ashutosh) have also been submitting the recent encryption bug reports against the hbase-11339 branch so there is some evidence of usage by them. The four of us (Jingcheng, Ram, Anoop and I) are satisfied with the feature and feel it is a good time to call a merge vote. Ive posted a megapatch version for folks who want to peruse the code. [3] What do you all think? Thanks, Jingcheng, Jon, Ram, and Anoop. [1] https://issues.apache.org/jira/browse/HBASE-11339 [2] https://github.com/apache/hbase/tree/hbase-11339 [3] https://reviews.apache.org/r/34475/ -- // Jonathan Hsieh (shay) // HBase Tech Lead, Software Engineer, Cloudera // [email protected] // @jmhsieh
