... and you've seen http://github.com/akkumar/hbasene and http://github.com/thkoch2001/lucehbase? St.Ack
On Mon, May 17, 2010 at 1:07 AM, Kevin Apte <technicalarchitect2...@gmail.com> wrote: > Consider a search system with an inverted word index- in other words, an > index which points to document location- with these columns- word, document > ID and possibly timestamp. > > Given a word, how will I know which tablet to scan to find all Document IDs, > with the given word. > > If you are indexing a large database - say 50 TB, then each word may be > split across multiple tablets. There may be hundreds of such tablets each > with a large number of SSTables to store the index. How will I know which > tablet to search for? Is there a master index that specifies which tablet > has words with range say "ro to ru" ? Or do I have to lookup Bloom > Filters for every tablet? > > Kevin >