there have been some effort recently to integrate hive with hbase (
TableInputFormat-s) , that was presented last week .
you can check with the integration to see if this use-case can be addressed
out of the box.


On Sat, Mar 20, 2010 at 10:07 PM, prasenjit mukherjee
<prasen....@gmail.com>wrote:

> Hi Karthik,
>
> Apologies if I was not very clear.  I don't want any ranking
> computation. My use case is extremely simple and is SQL-like , that
> too restricted to only  aggregates.
>
> >> "SELECT SUM(col1) WHERE col2='foo' and col3='bar'.
>
> My intent is to replace oracle ( as it is prohibitively time-consuming
> beyond a certain rowsize ) for simple aggregate ( like the above SQLs
> ) . And unfortunately hbase doesnt support that kind of use-case.
> Mainly because hbase is key-based. I cant really come up with a key
> based on all possible query combination in hbase. Hence the need for
> lucene's data structure. In a loose term you can probably think of it
> as a special case of join.
>
> -Prasen
>
> On Sun, Mar 21, 2010 at 9:51 AM, Karthik K <oss....@gmail.com> wrote:
> > On Sat, Mar 20, 2010 at 9:07 PM, prasenjit mukherjee
> > <prasen....@gmail.com>wrote:
> >
> >> I too am interested in something like this, but not necessarily for
> >> ranking. My use case is for selective(search based)-aggregates like
> >>
> >> "SELECT SUM(col1) WHERE col2='foo' and col3='bar'.
> >>
> >> The data structure lucene uses ( per-field-indexing and posting-stream
> >> based)  seems to be ideal for these kind of use case which no other
> >
> > open-source-data-structure has. I would be more than happy to stand
> >> corrected.
> >>
> >
> > if you are referring to tf-idf , then yes, you can express the
> relationship
> > as the 'and' operation of the idf (inverse document freq). and run
> through
> > the count.
> > Checkout TermEnum in the lucene api to get more details about it. There
> have
> > been discussions in the lucene list , about refactoring of some of the
> > internal structures (of lucene) to support other algorithms like BM25(b)
> > etc.
> >
>

Reply via email to