Hi All, I have HBase running now, building Lucene indexes on Hadoop successfully and then I will get Katta running for distributing my indexes.
I have around 15 search fields indexed that I wish to extract and return those 15 to the user in the result set - my result sets will be up to millions of records... Should I: a) have the values stored in the Lucene index which will make it slower to search but returns the results immediately in pages without hitting HBase or b) Not store the data in the index but page over the Lucene index and do millions of "get by ROWKEY" on HBase Obviously this is not happening synchronously while the user waits, but looking forward to hear if people have done similar scenarios and what worked out nicely... Lucene degrades in performance at large page numbers (100th page of 1000 results) right? Thanks for any insights, Tim
