Hi,
A few us at IBM Almaden Research Center built a distributed text index prototype called HIndex. The key design point of HIndex is to build the index by leveraging the distributed control layer in HBase, for availability, elasticity and load balancing. In our prototype, we used Lucene to implement a new type of region for storing the text index. Attached is a research paper that we wrote and submitted to USENIX 2009. It covers the design of HIndex and a performance evaluation (some of the results are applicable to HBase too). We are grateful for the HBase community. We welcome comments and suggestions. (See attached file: usenix09.pdf) Jun IBM Almaden Research Center K55/B1, 650 Harry Road, San Jose, CA 95120-6099
