Hi,
I am aware of a paper published at usenix09 about a distributed text
index using a row store:
"Leveraging a Scalable Row Store to Build a Distributed Text Index"
https://issues.apache.org/jira/secure/attachment/12397670/usenix09.pdf
Could be interesting for you maybe.
--
Renaud Delbru
Menno Luiten wrote:
Hi everyone,
I'm working on a project in which we need a distributed inverted index, and
are getting some fair results using HBase and Hadoop (Crawlers -> Document
Repository (HBase) --M/R-> Document Index (Hbase) --M/R-> Inverted Index).
However, we are also investigating more efficient methods to use this
inverted index. So after reading [1] we are wondering if anyone figured a
way to let a HBase cluster do document-based partitioning instead of
term-based partitioning.
Basically the question boils down to: is there a easy way to distribute
columns over multiple regions and let a client/HBase scan over multiple
regions to gather a row and its columns? And if no, are there people using
HBase for (search system) inverted indexes anyway and how is it coping?
Greetings,
Menno Luiten
[1] B. Cambazoglu, et al. "Effects of Inverted Index Partitioning Schemes on
Performance of Query Processing in Parallel Text Retrieval Systems"