Kevin, You would want to make your row keys the words.
HBase defines it's tablets (called Regions) by the startRow and endRow. So as you say, a given region may contain "ro to ru". Looking up the word "round" would use that region. This is handled automatically by the META table. For a refresher on these concepts, check out the BigTable paper. There have also been some discussions about inverted word indexes on this mailing list though I don't have links. JG > -----Original Message----- > From: Kevin Apte [mailto:technicalarchitect2...@gmail.com] > Sent: Monday, May 17, 2010 1:07 AM > To: hbase-user@hadoop.apache.org > Subject: Inverted word index... > > Consider a search system with an inverted word index- in other > words, an > index which points to document location- with these columns- word, > document > ID and possibly timestamp. > > Given a word, how will I know which tablet to scan to find all Document > IDs, > with the given word. > > If you are indexing a large database - say 50 TB, then each word may be > split across multiple tablets. There may be hundreds of such tablets > each > with a large number of SSTables to store the index. How will I know > which > tablet to search for? Is there a master index that specifies which > tablet > has words with range say "ro to ru" ? Or do I have to lookup Bloom > Filters for every tablet? > > Kevin