While in Beijing I met with a group at the Institute of Computing at the Chinese Academy of Sciences who are interested in contributing a secondary indexing scheme for HBase. It is my understanding this is the same group that contributed RCFile to Hive. See at the links below a slide deck and technical report describing what they have done, called CCIndex.
Slides: https://iridiant.s3.amazonaws.com/ccindex_v1.pdf Paper: https://iridiant.s3.amazonaws.com/CCIndex.pdf We discussed initially posting their code -- based on 0.20.1 -- up on GitHub and this was agreed. This should be happening soon. We also discussed a possible path for contribution of this work in maintainable/distributable form as a coprocessor based reimplementation, considering support in the framework for what CCindex needs at a low level (I/O concerns), and splitting out the rest into a coprocessor. I've heard other talk of implementing secondary indexing using a coprocessor foundation. I think CCIndex is one option on the table, a starting point for discussion. Best regards, - Andy
