Kevin,

You would want to make your row keys the words.

HBase defines it's tablets (called Regions) by the startRow and endRow.  So as 
you say, a given region may contain "ro to ru".  Looking up the word "round" 
would use that region.  This is handled automatically by the META table.

For a refresher on these concepts, check out the BigTable paper.  There have 
also been some discussions about inverted word indexes on this mailing list 
though I don't have links.

JG

> -----Original Message-----
> From: Kevin Apte [mailto:technicalarchitect2...@gmail.com]
> Sent: Monday, May 17, 2010 1:07 AM
> To: hbase-user@hadoop.apache.org
> Subject: Inverted word index...
> 
>     Consider a search system with an inverted word index- in other
> words, an
> index which points to document location- with these columns- word,
> document
> ID and possibly timestamp.
> 
> Given a word, how will I know which tablet to scan to find all Document
> IDs,
> with the given word.
> 
> If you are indexing a large database - say 50 TB, then each word may be
> split across multiple tablets. There may be hundreds  of such tablets
> each
> with a large number of SSTables  to store the index. How will I know
> which
> tablet to search for?  Is there a master index that specifies which
> tablet
> has words with range say "ro to ru"  ?    Or do I have to lookup Bloom
> Filters for every tablet?
> 
> Kevin

Reply via email to