That paper looks great. Add a link to it here, http://wiki.apache.org/hadoop/HBase/Articles? Is the software available? Thanks, St.Ack
On Mon, May 17, 2010 at 10:22 AM, Ioannis Konstantinou <ik...@cslab.ntua.gr> wrote: > Hi, > > you can also read the following paper > http://www.cslab.ntua.gr/~ikons/distributed_indexing_of_webscale_datasets_for_the_cloud_mdac_2010_cr.pdf > where we present an inverted index system based on hbase (both the index and > the content is served through hbase, and indexing is performed through > mapreduce hadoop functions) > > στις 17/5/2010 6:44 μμ, O/H Jonathan Gray έγραψε: >> >> Kevin, >> >> You would want to make your row keys the words. >> >> HBase defines it's tablets (called Regions) by the startRow and endRow. >> So as you say, a given region may contain "ro to ru". Looking up the word >> "round" would use that region. This is handled automatically by the META >> table. >> >> For a refresher on these concepts, check out the BigTable paper. There >> have also been some discussions about inverted word indexes on this mailing >> list though I don't have links. >> >> JG >> >> >>> >>> -----Original Message----- >>> From: Kevin Apte [mailto:technicalarchitect2...@gmail.com] >>> Sent: Monday, May 17, 2010 1:07 AM >>> To: hbase-user@hadoop.apache.org >>> Subject: Inverted word index... >>> >>> Consider a search system with an inverted word index- in other >>> words, an >>> index which points to document location- with these columns- word, >>> document >>> ID and possibly timestamp. >>> >>> Given a word, how will I know which tablet to scan to find all Document >>> IDs, >>> with the given word. >>> >>> If you are indexing a large database - say 50 TB, then each word may be >>> split across multiple tablets. There may be hundreds of such tablets >>> each >>> with a large number of SSTables to store the index. How will I know >>> which >>> tablet to search for? Is there a master index that specifies which >>> tablet >>> has words with range say "ro to ru" ? Or do I have to lookup Bloom >>> Filters for every tablet? >>> >>> Kevin >>> > > -- > Ioannis Konstantinou > Research Associate, Computing Systems Laboratory > National Technical University of Athens > phone: +30 2107721544(internal 421) > mobile: +30 6945992906 > Web: http://www.cslab.ntua.gr/~ikons > >