Re: Hbase inverted index partitioning

Renaud Delbru Fri, 08 May 2009 05:53:04 -0700

Hi,

I am aware of a paper published at usenix09 about a distributed textindex using a row store:

"Leveraging a Scalable Row Store to Build a Distributed Text Index"
https://issues.apache.org/jira/secure/attachment/12397670/usenix09.pdf


Could be interesting for you maybe.
--
Renaud Delbru

Menno Luiten wrote:

Hi everyone,

I'm working on a project in which we need a distributed inverted index, and
are getting some fair results using HBase and Hadoop (Crawlers -> Document
Repository (HBase) --M/R-> Document Index (Hbase) --M/R-> Inverted Index).
However, we are also investigating more efficient methods to use this
inverted index. So after reading [1] we are wondering if anyone figured a
way to let a HBase cluster do document-based partitioning instead of

term-based partitioning.

Basically the question boils down to: is there a easy way to distribute
columns over multiple regions and let a client/HBase scan over multiple
regions to gather a row and its columns? And if no, are there people using
HBase for (search system) inverted indexes anyway and how is it coping?

Greetings,

Menno Luiten

[1] B. Cambazoglu, et al. "Effects of Inverted Index Partitioning Schemes on
Performance of Query Processing in Parallel Text Retrieval Systems"

Re: Hbase inverted index partitioning

Reply via email to