Hi Karthik, Apologies if I was not very clear. I don't want any ranking computation. My use case is extremely simple and is SQL-like , that too restricted to only aggregates.
>> "SELECT SUM(col1) WHERE col2='foo' and col3='bar'. My intent is to replace oracle ( as it is prohibitively time-consuming beyond a certain rowsize ) for simple aggregate ( like the above SQLs ) . And unfortunately hbase doesnt support that kind of use-case. Mainly because hbase is key-based. I cant really come up with a key based on all possible query combination in hbase. Hence the need for lucene's data structure. In a loose term you can probably think of it as a special case of join. -Prasen On Sun, Mar 21, 2010 at 9:51 AM, Karthik K <oss....@gmail.com> wrote: > On Sat, Mar 20, 2010 at 9:07 PM, prasenjit mukherjee > <prasen....@gmail.com>wrote: > >> I too am interested in something like this, but not necessarily for >> ranking. My use case is for selective(search based)-aggregates like >> >> "SELECT SUM(col1) WHERE col2='foo' and col3='bar'. >> >> The data structure lucene uses ( per-field-indexing and posting-stream >> based) seems to be ideal for these kind of use case which no other > > open-source-data-structure has. I would be more than happy to stand >> corrected. >> > > if you are referring to tf-idf , then yes, you can express the relationship > as the 'and' operation of the idf (inverse document freq). and run through > the count. > Checkout TermEnum in the lucene api to get more details about it. There have > been discussions in the lucene list , about refactoring of some of the > internal structures (of lucene) to support other algorithms like BM25(b) > etc. >