Re: lucene index{reader|writer} on hbase (GSOC idea?)

prasenjit mukherjee Sat, 20 Mar 2010 22:07:51 -0700

Hi Karthik,

Apologies if I was not very clear.  I don't want any ranking
computation. My use case is extremely simple and is SQL-like , that
too restricted to only  aggregates.


>> "SELECT SUM(col1) WHERE col2='foo' and col3='bar'.

My intent is to replace oracle ( as it is prohibitively time-consuming
beyond a certain rowsize ) for simple aggregate ( like the above SQLs
) . And unfortunately hbase doesnt support that kind of use-case.
Mainly because hbase is key-based. I cant really come up with a key
based on all possible query combination in hbase. Hence the need for
lucene's data structure. In a loose term you can probably think of it
as a special case of join.

-Prasen

On Sun, Mar 21, 2010 at 9:51 AM, Karthik K <oss....@gmail.com> wrote:
> On Sat, Mar 20, 2010 at 9:07 PM, prasenjit mukherjee
> <prasen....@gmail.com>wrote:
>
>> I too am interested in something like this, but not necessarily for
>> ranking. My use case is for selective(search based)-aggregates like
>>
>> "SELECT SUM(col1) WHERE col2='foo' and col3='bar'.
>>
>> The data structure lucene uses ( per-field-indexing and posting-stream
>> based)  seems to be ideal for these kind of use case which no other
>
> open-source-data-structure has. I would be more than happy to stand
>> corrected.
>>
>
> if you are referring to tf-idf , then yes, you can express the relationship
> as the 'and' operation of the idf (inverse document freq). and run through
> the count.
> Checkout TermEnum in the lucene api to get more details about it. There have
> been discussions in the lucene list , about refactoring of some of the
> internal structures (of lucene) to support other algorithms like BM25(b)
> etc.
>

Re: lucene index{reader|writer} on hbase (GSOC idea?)

Reply via email to