Re: Lucene from HBase - raw values in Lucene index or not?

stack Tue, 16 Dec 2008 14:37:58 -0800

Interesting question.

Would be grand if you didn't have to duplicate the hbase data in thelucene index, just store the hbase locations -- or, just store smallstuff in the lucene index and leave big-stuff back in hbase -- butperhaps the double hop of lucene first and then to hbase will notperform well enough? 0.19.0 hbase will be better than 0.18.0 if you canwait a week or so for the release candiate to test.


Let us know how it goes Tim,
St.Ack


tim robertson wrote:

Hi All,

I have HBase running now, building Lucene indexes on Hadoop
successfully and then I will get Katta running for distributing my
indexes.

I have around 15 search fields indexed that I wish to extract and
return those 15 to the user in the result set - my result sets will be
up to millions of records...

Should I:

  a) have the values stored in the Lucene index which will make it
slower to search but returns the results immediately in pages without
hitting HBase

or

  b) Not store the data in the index but page over the Lucene index
and do millions of "get by ROWKEY" on HBase

Obviously this is not happening synchronously while the user waits,
but looking forward to hear if people have done similar scenarios and
what worked out nicely...

Lucene degrades in performance at large page numbers (100th page of
1000 results) right?

Thanks for any insights,

Tim

Re: Lucene from HBase - raw values in Lucene index or not?

Reply via email to