there have been some effort recently to integrate hive with hbase ( TableInputFormat-s) , that was presented last week . you can check with the integration to see if this use-case can be addressed out of the box.
On Sat, Mar 20, 2010 at 10:07 PM, prasenjit mukherjee <prasen....@gmail.com>wrote: > Hi Karthik, > > Apologies if I was not very clear. I don't want any ranking > computation. My use case is extremely simple and is SQL-like , that > too restricted to only aggregates. > > >> "SELECT SUM(col1) WHERE col2='foo' and col3='bar'. > > My intent is to replace oracle ( as it is prohibitively time-consuming > beyond a certain rowsize ) for simple aggregate ( like the above SQLs > ) . And unfortunately hbase doesnt support that kind of use-case. > Mainly because hbase is key-based. I cant really come up with a key > based on all possible query combination in hbase. Hence the need for > lucene's data structure. In a loose term you can probably think of it > as a special case of join. > > -Prasen > > On Sun, Mar 21, 2010 at 9:51 AM, Karthik K <oss....@gmail.com> wrote: > > On Sat, Mar 20, 2010 at 9:07 PM, prasenjit mukherjee > > <prasen....@gmail.com>wrote: > > > >> I too am interested in something like this, but not necessarily for > >> ranking. My use case is for selective(search based)-aggregates like > >> > >> "SELECT SUM(col1) WHERE col2='foo' and col3='bar'. > >> > >> The data structure lucene uses ( per-field-indexing and posting-stream > >> based) seems to be ideal for these kind of use case which no other > > > > open-source-data-structure has. I would be more than happy to stand > >> corrected. > >> > > > > if you are referring to tf-idf , then yes, you can express the > relationship > > as the 'and' operation of the idf (inverse document freq). and run > through > > the count. > > Checkout TermEnum in the lucene api to get more details about it. There > have > > been discussions in the lucene list , about refactoring of some of the > > internal structures (of lucene) to support other algorithms like BM25(b) > > etc. > > >