Hi Fleming I am pretty much a novice at HBase, but I have asked a similar question a while ago - the question was whether to to put the data in the Lucene index or to index the keys only and then get the data with a series of getByKey(...) operations. It seems there are no hard and fast rules for this, so I think it is worth trying what you propose. It is certainly what we are playing with at the moment, but it is not live.
Cheers, Tim 2009/6/23 <[email protected]>: > Hello Tim, > > I would like to do queries by range(maybe by date) or specific family > column value. > Build these family column (as index column) with primary key mapping that > I can use > these family column value to locate its primary key, then I can use these > key to query HBase. > Is it the right way if I try to use BuildTableIndex? > > Fleming > > > > > tim robertson > <timrobertson100@ To: > [email protected] > gmail.com> cc: (bcc: Y_823910/TSMC) > Subject: Re: Katta for > secondary index? > 2009/06/23 03:45 > PM > Please respond to > hbase-user > > > > > > > What kind of searches are you doing with the secondary indexes? Will > it be range queries for example or simply "give me all the records for > this key"? > > > > On Tue, Jun 23, 2009 at 9:44 AM, tim robertson<[email protected]> > wrote: >> For build table index: >> >> ? ? ? ? ? ? ? ?BuildTableIndex bti = new BuildTableIndex(); >> ? ? ? ? ? ? ? ?JobConf conf = new JobConf(TestBuildLucene.class); >> ? ? ? ? ? ? ? ?conf = bti.createJob(conf, 1, 1, "/tmp/lucene-hbase", > "occurrence", >> "raw:CatalogueNo"); >> ? ? ? ? ? ? ? ?try { >> ? ? ? ? ? ? ? ? ? ? ? ?long time = System.currentTimeMillis(); >> ? ? ? ? ? ? ? ? ? ? ? ?System.out.println("Starting the job > input[occurrence] >> output[/tmp/lucene-hbase]"); >> ? ? ? ? ? ? ? ? ? ? ? ?JobClient.runJob(conf); >> ? ? ? ? ? ? ? ? ? ? ? ?System.out.println("Finished in " + >> (1+System.currentTimeMillis()-time)/1000 + " secs!"); >> ? ? ? ? ? ? ? ?} catch (IOException e) { >> ? ? ? ? ? ? ? ? ? ? ? ?e.printStackTrace(); >> ? ? ? ? ? ? ? ?} >> >> >> Cheers >> Tim >> >> >> >> >> On Tue, Jun 23, 2009 at 9:39 AM, <[email protected]> wrote: >>> Hi, >>> >>> Is there any code snippet of how to use BuildTableIndex and > IndexedTable? >>> Thank you. >>> >>> Fleming >>> >>> >>> >>> >>> >>> ? ? ? ? ? ? ? ? ? ? [email protected] >>> ? ? ? ? ? ? ? ? ? ? ?om > To: ? ? [email protected] >>> ? ? ? ? ? ? ? ? ? ? ?Sent by: ? ? ? ? ? ? ? ? cc: ? ? ?(bcc: > Y_823910/TSMC) >>> ? ? ? ? ? ? ? ? ? ? [email protected] ? ? ? ?Subject: Re: Katta for > secondary index? >>> ? ? ? ? ? ? ? ? ? ? ?om >>> >>> >>> ? ? ? ? ? ? ? ? ? ? ?2009/06/23 01:39 >>> ? ? ? ? ? ? ? ? ? ? ?PM >>> ? ? ? ? ? ? ? ? ? ? ?Please respond to >>> ? ? ? ? ? ? ? ? ? ? ?hbase-user >>> >>> >>> >>> >>> >>> >>> On Mon, Jun 22, 2009 at 5:46 PM, <[email protected]> wrote: >>> >>>> Hi there, >>>> >>>> HBase access data only by key, right? >>>> Anybody use HBase + Katta(for secondary index)? Does it work? >>> >>> >>> >>> Katta works but its just a means of distributing lucene indices. ?You > need >>> to make the indices first. ?You've checked out the BuildTableIndex >>> mapreduce >>> job in hbase? ?It indexes table contents. ?The index is sharded by the >>> number of reducers you run. ?Perhaps you can have Katta deploy this > product >>> for you? ?Perhaps the indices made are not what you want for secondary >>> lookups but you could adapt BuildTableIndex? >>> >>> Does the table change frequently? ?A batch job to redo the index is OK > with >>> you? ?In TRUNK you could run a scan that only found records created > after a >>> certain date so you could add incremental indices and then do the full >>> build >>> of the index at some lesser frequency. >>> >>> There is also the experimental tableindexed subclass of hbase that will >>> keep >>> up a secondary table as an index using transactional hbase so insert > into >>> primary and secondary table is done as a single transaction (Its not yet > in >>> trunk but should be here soon). >>> >>> St.Ack >>> >>> >>>> We just want to transfer part of our Oracle table data to HBase >>>> for multi parallel computing. >>>> Any suggestions would be appreciated! >>>> Thank you >>>> >>>> Fleming >>>> >>>> >>> > --------------------------------------------------------------------------- >>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? TSMC PROPERTY >>>> ?This email communication (and any attachments) is proprietary >>> information >>>> ?for the sole use of its >>>> ?intended recipient. Any unauthorized review, use or distribution by >>> anyone >>>> ?other than the intended >>>> ?recipient is strictly prohibited. ?If you are not the intended >>> recipient, >>>> ?please notify the sender by >>>> ?replying to this email, and then delete this email and any copies of > it >>>> ?immediately. Thank you. >>>> >>>> >>> > --------------------------------------------------------------------------- >>>> >>>> >>>> >>>> >>> >>> >>> >>> >>> ?--------------------------------------------------------------------------- > >>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? TSMC PROPERTY >>> ?This email communication (and any attachments) is proprietary > information >>> ?for the sole use of its >>> ?intended recipient. Any unauthorized review, use or distribution by > anyone >>> ?other than the intended >>> ?recipient is strictly prohibited. ?If you are not the intended > recipient, >>> ?please notify the sender by >>> ?replying to this email, and then delete this email and any copies of it >>> ?immediately. Thank you. >>> ?--------------------------------------------------------------------------- > >>> >>> >>> >>> >> > > > > > --------------------------------------------------------------------------- > TSMC PROPERTY > This email communication (and any attachments) is proprietary information > for the sole use of its > intended recipient. Any unauthorized review, use or distribution by anyone > other than the intended > recipient is strictly prohibited. If you are not the intended recipient, > please notify the sender by > replying to this email, and then delete this email and any copies of it > immediately. Thank you. > --------------------------------------------------------------------------- > > > >
