Stack, Since our prototype was based on an old version of HBase and haven't been fully tested (as of May 2008), I didn't include the source code.
I am posting the paper first to see the reaction from the community. Architecture-wise, we think this is a good idea. However, the random read performance in HDFS today makes it too expensive to search. HBase itself suffers on reads from HDFS too. To address that, we probably need either a fast random read path in HDFS or an in-memory caching layer in HBase itself. Jun IBM Almaden Research Center K55/B1, 650 Harry Road, San Jose, CA 95120-6099 stack <[email protected]> wrote on 01/12/2009 03:26:56 PM: > Thanks Jun for posting it. Is the source available? Also, I didn't get > a sense as to whether the authors thought this an avenue worth pursing > further? Whats your sense? > St.Ack > > Jun Rao wrote: > > I uploaded the paper in this JIRA I justed opened: > > https://issues.apache.org/jira/browse/HBASE-1122 > > > > Thanks, > > > > Jun > > > > [email protected] wrote on 01/11/2009 07:47:54 PM: > > > > > >> Jun, > >> > >> As a HBase committer I'd love to read your paper, but unfortunately it > >> didn't went in the mailing list. Can you provide a external link? > >> > >> Thanks, > >> > >> J-D > >> > >> On Sun, Jan 11, 2009 at 10:43 PM, Jun Rao <[email protected]> wrote: > >> > >> > >>> Hi, > >>> > >>> A few us at IBM Almaden Research Center built a distributed text index > >>> prototype called HIndex. The key design point of HIndex is to build the > >>> index by leveraging the distributed control layer in HBase, for > >>> availability, elasticity and load balancing. In our prototype, we used > >>> Lucene to implement a new type of region for storing the text index. > >>> Attached is a research paper that we wrote and submitted to USENIX > >>> > > 2009. It > > > >>> covers the design of HIndex and a performance evaluation (some of the > >>> results are applicable to HBase too). > >>> > >>> We are grateful for the HBase community. We welcome comments and > >>> suggestions. > >>> > >>> *(See attached file: usenix09.pdf)* > >>> > >>> Jun > >>> IBM Almaden Research Center > >>> K55/B1, 650 Harry Road, San Jose, CA 95120-6099 > >>> > >>> >
