Interesting question.

Would be grand if you didn't have to duplicate the hbase data in the lucene index, just store the hbase locations -- or, just store small stuff in the lucene index and leave big-stuff back in hbase -- but perhaps the double hop of lucene first and then to hbase will not perform well enough? 0.19.0 hbase will be better than 0.18.0 if you can wait a week or so for the release candiate to test.

Let us know how it goes Tim,
St.Ack


tim robertson wrote:
Hi All,

I have HBase running now, building Lucene indexes on Hadoop
successfully and then I will get Katta running for distributing my
indexes.

I have around 15 search fields indexed that I wish to extract and
return those 15 to the user in the result set - my result sets will be
up to millions of records...

Should I:

  a) have the values stored in the Lucene index which will make it
slower to search but returns the results immediately in pages without
hitting HBase

or

  b) Not store the data in the index but page over the Lucene index
and do millions of "get by ROWKEY" on HBase

Obviously this is not happening synchronously while the user waits,
but looking forward to hear if people have done similar scenarios and
what worked out nicely...

Lucene degrades in performance at large page numbers (100th page of
1000 results) right?

Thanks for any insights,

Tim

Reply via email to