Consider a search system with an inverted word index- in other words, an
index which points to document location- with these columns- word, document
ID and possibly timestamp.

Given a word, how will I know which tablet to scan to find all Document IDs,
with the given word.

If you are indexing a large database - say 50 TB, then each word may be
split across multiple tablets. There may be hundreds  of such tablets each
with a large number of SSTables  to store the index. How will I know which
tablet to search for?  Is there a master index that specifies which tablet
has words with range say "ro to ru"  ?    Or do I have to lookup Bloom
Filters for every tablet?

Kevin

Reply via email to