That paper looks great.   Add a link to it here,
http://wiki.apache.org/hadoop/HBase/Articles?  Is the software
available?
Thanks,
St.Ack

On Mon, May 17, 2010 at 10:22 AM, Ioannis Konstantinou
<ik...@cslab.ntua.gr> wrote:
> Hi,
>
> you can also read the following paper
> http://www.cslab.ntua.gr/~ikons/distributed_indexing_of_webscale_datasets_for_the_cloud_mdac_2010_cr.pdf
> where we present an inverted index system based on hbase (both the index and
> the content is served through hbase, and indexing is performed through
> mapreduce hadoop functions)
>
> στις 17/5/2010 6:44 μμ, O/H Jonathan Gray έγραψε:
>>
>> Kevin,
>>
>> You would want to make your row keys the words.
>>
>> HBase defines it's tablets (called Regions) by the startRow and endRow.
>>  So as you say, a given region may contain "ro to ru".  Looking up the word
>> "round" would use that region.  This is handled automatically by the META
>> table.
>>
>> For a refresher on these concepts, check out the BigTable paper.  There
>> have also been some discussions about inverted word indexes on this mailing
>> list though I don't have links.
>>
>> JG
>>
>>
>>>
>>> -----Original Message-----
>>> From: Kevin Apte [mailto:technicalarchitect2...@gmail.com]
>>> Sent: Monday, May 17, 2010 1:07 AM
>>> To: hbase-user@hadoop.apache.org
>>> Subject: Inverted word index...
>>>
>>>     Consider a search system with an inverted word index- in other
>>> words, an
>>> index which points to document location- with these columns- word,
>>> document
>>> ID and possibly timestamp.
>>>
>>> Given a word, how will I know which tablet to scan to find all Document
>>> IDs,
>>> with the given word.
>>>
>>> If you are indexing a large database - say 50 TB, then each word may be
>>> split across multiple tablets. There may be hundreds  of such tablets
>>> each
>>> with a large number of SSTables  to store the index. How will I know
>>> which
>>> tablet to search for?  Is there a master index that specifies which
>>> tablet
>>> has words with range say "ro to ru"  ?    Or do I have to lookup Bloom
>>> Filters for every tablet?
>>>
>>> Kevin
>>>
>
> --
> Ioannis Konstantinou
> Research Associate, Computing Systems Laboratory
> National Technical University of Athens
> phone: +30 2107721544(internal 421)
> mobile: +30 6945992906
> Web: http://www.cslab.ntua.gr/~ikons
>
>

Reply via email to