After reading the first few pages: very interesting and helpful indeed!

Although it is not clear to me from the JIRA and paper if the HIndex will be
eventually be contributed to the project (any clarification from the guys at
IBM?), this will definitely help us get to grips with the basic idea and
place within HBase.

Greetings,
Menno

-----Oorspronkelijk bericht-----
Van: Renaud Delbru [mailto:[email protected]] 
Verzonden: vrijdag 8 mei 2009 14:52
Aan: [email protected]
Onderwerp: Re: Hbase inverted index partitioning

Hi,

I am aware of a paper published at usenix09 about a distributed text 
index using a row store:
"Leveraging a Scalable Row Store to Build a Distributed Text Index"
https://issues.apache.org/jira/secure/attachment/12397670/usenix09.pdf

Could be interesting for you maybe.
-- 
Renaud Delbru

Menno Luiten wrote:
> Hi everyone,
>
> I'm working on a project in which we need a distributed inverted index,
and
> are getting some fair results using HBase and Hadoop (Crawlers -> Document
> Repository (HBase) --M/R-> Document Index (Hbase) --M/R-> Inverted Index).
> However, we are also investigating more efficient methods to use this
> inverted index. So after reading [1] we are wondering if anyone figured a
> way to let a HBase cluster do document-based partitioning instead of
> term-based partitioning. 
>
> Basically the question boils down to: is there a easy way to distribute
> columns over multiple regions and let a client/HBase scan over multiple
> regions to gather a row and its columns? And if no, are there people using
> HBase for (search system) inverted indexes anyway and how is it coping?
>
> Greetings,
>
> Menno Luiten
>
> [1] B. Cambazoglu, et al. "Effects of Inverted Index Partitioning Schemes
on
> Performance of Query Processing in Parallel Text Retrieval Systems"
>
>   


Reply via email to