Hbase inverted index partitioning

Menno Luiten Fri, 08 May 2009 05:25:45 -0700

Hi everyone,

I'm working on a project in which we need a distributed inverted index, and
are getting some fair results using HBase and Hadoop (Crawlers -> Document
Repository (HBase) --M/R-> Document Index (Hbase) --M/R-> Inverted Index).
However, we are also investigating more efficient methods to use this
inverted index. So after reading [1] we are wondering if anyone figured a
way to let a HBase cluster do document-based partitioning instead of
term-based partitioning.


Basically the question boils down to: is there a easy way to distribute
columns over multiple regions and let a client/HBase scan over multiple
regions to gather a row and its columns? And if no, are there people using
HBase for (search system) inverted indexes anyway and how is it coping?

Greetings,

Menno Luiten

[1] B. Cambazoglu, et al. "Effects of Inverted Index Partitioning Schemes on
Performance of Query Processing in Parallel Text Retrieval Systems"

Hbase inverted index partitioning

Reply via email to