Not at all. I am very interested to hear your results!
2009/6/23 <[email protected]>: > Tim, > Thank you very much for your help ^_^ > > Fleming > > > > tim robertson > <timrobertson100@ To: > [email protected] > gmail.com> cc: (bcc: Y_823910/TSMC) > Subject: Re: Katta for > secondary index? > 2009/06/23 04:41 > PM > Please respond to > hbase-user > > > > > > > Hi, > > Yes it will and then you need to copy it out of HDFS for Lucene to read it. > If it is a huge index, this is where Katta would be useful, as it will > deploy across a cluster of lucene machines for you (by copying out of > HDFS). I would recommend as a start to build an index of a sample of > your data and copy it out manually and start up Lucene checking it > works. Then try and guess how big the index would be if you did it on > all your data. > > Again - I am pretty novice though... > > Cheers > > Tim > > > 2009/6/23 <[email protected]>: >> Hi Tim, >> >> Using map/red to Build Table Index , will it output a index file in HDFS? >> How to use it with efficiency while it becames very large? >> Will it be a bottleneck while many parallel programs access that large >> index file? >> Any ideas? >> >> Fleming >> >> >> >> >> >> ? ? ? ? ? ? ? ? ? ? ?tim robertson >> ? ? ? ? ? ? ? ? ? ? ?<timrobertson100@ ? ? ? ?To: ? ? >> [email protected] > >> ? ? ? ? ? ? ? ? ? ? ?gmail.com> ? ? ? ? ? ? ? cc: ? ? ?(bcc: > Y_823910/TSMC) >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Subject: Re: Katta for > secondary index? >> ? ? ? ? ? ? ? ? ? ? ?2009/06/23 04:13 >> ? ? ? ? ? ? ? ? ? ? ?PM >> ? ? ? ? ? ? ? ? ? ? ?Please respond to >> ? ? ? ? ? ? ? ? ? ? ?hbase-user >> >> >> >> >> >> >> Hi Fleming >> >> I am pretty much a novice at HBase, but I have asked a similar >> question a while ago - the question was whether to to put the data in >> the Lucene index or to index the keys only and then get the data with >> a series of getByKey(...) operations. ?It seems there are no hard and >> fast rules for this, so I think it is worth trying what you propose. >> It is certainly what we are playing with at the moment, but it is not >> live. >> >> Cheers, >> >> Tim >> >> 2009/6/23 ?<[email protected]>: >>> Hello Tim, >>> >>> I would like to do queries by range(maybe by date) or specific family >>> column value. >>> Build these ?family column (as index column) ?with primary key mapping >> that >>> I can use >>> these ?family column value to locate its primary key, then I can use >> these >>> key to query HBase. >>> Is it the right way if I try to use ?BuildTableIndex? >>> >>> Fleming >>> >>> >>> >>> >>> ? ? ? ? ? ? ? ? ? ? ?tim robertson >>> ? ? ? ? ? ? ? ? ? ? ?<timrobertson100@ ? ? ? ?To: ? ? > [email protected] >> >>> ? ? ? ? ? ? ? ? ? ? ?gmail.com> ? ? ? ? ? ? ? cc: ? ? ?(bcc: >> Y_823910/TSMC) >>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Subject: Re: Katta for >> secondary index? >>> ? ? ? ? ? ? ? ? ? ? ?2009/06/23 03:45 >>> ? ? ? ? ? ? ? ? ? ? ?PM >>> ? ? ? ? ? ? ? ? ? ? ?Please respond to >>> ? ? ? ? ? ? ? ? ? ? ?hbase-user >>> >>> >>> >>> >>> >>> >>> What kind of searches are you doing with the secondary indexes? ?Will >>> it be range queries for example or simply "give me all the records for >>> this key"? >>> >>> >>> >>> On Tue, Jun 23, 2009 at 9:44 AM, tim > robertson<[email protected]> >>> wrote: >>>> For build table index: >>>> >>>> ? ? ? ? ? ? ? ?BuildTableIndex bti = new BuildTableIndex(); >>>> ? ? ? ? ? ? ? ?JobConf conf = new JobConf(TestBuildLucene.class); >>>> ? ? ? ? ? ? ? ?conf = bti.createJob(conf, 1, 1, "/tmp/lucene-hbase", >>> "occurrence", >>>> "raw:CatalogueNo"); >>>> ? ? ? ? ? ? ? ?try { >>>> ? ? ? ? ? ? ? ? ? ? ? ?long time = System.currentTimeMillis(); >>>> ? ? ? ? ? ? ? ? ? ? ? ?System.out.println("Starting the job >>> input[occurrence] >>>> output[/tmp/lucene-hbase]"); >>>> ? ? ? ? ? ? ? ? ? ? ? ?JobClient.runJob(conf); >>>> ? ? ? ? ? ? ? ? ? ? ? ?System.out.println("Finished in " + >>>> (1+System.currentTimeMillis()-time)/1000 + " secs!"); >>>> ? ? ? ? ? ? ? ?} catch (IOException e) { >>>> ? ? ? ? ? ? ? ? ? ? ? ?e.printStackTrace(); >>>> ? ? ? ? ? ? ? ?} >>>> >>>> >>>> Cheers >>>> Tim >>>> >>>> >>>> >>>> >>>> On Tue, Jun 23, 2009 at 9:39 AM, <[email protected]> wrote: >>>>> Hi, >>>>> >>>>> Is there any code snippet of how to use BuildTableIndex and >>> IndexedTable? >>>>> Thank you. >>>>> >>>>> Fleming >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> ? ? ? ? ? ? ? ? ? ? [email protected] >>>>> ? ? ? ? ? ? ? ? ? ? ?om >>> To: ? ? [email protected] >>>>> ? ? ? ? ? ? ? ? ? ? ?Sent by: ? ? ? ? ? ? ? ? cc: ? ? ?(bcc: >>> Y_823910/TSMC) >>>>> ? ? ? ? ? ? ? ? ? ? [email protected] ? ? ? ?Subject: Re: Katta for >>> secondary index? >>>>> ? ? ? ? ? ? ? ? ? ? ?om >>>>> >>>>> >>>>> ? ? ? ? ? ? ? ? ? ? ?2009/06/23 01:39 >>>>> ? ? ? ? ? ? ? ? ? ? ?PM >>>>> ? ? ? ? ? ? ? ? ? ? ?Please respond to >>>>> ? ? ? ? ? ? ? ? ? ? ?hbase-user >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Mon, Jun 22, 2009 at 5:46 PM, <[email protected]> wrote: >>>>> >>>>>> Hi there, >>>>>> >>>>>> HBase access data only by key, right? >>>>>> Anybody use HBase + Katta(for secondary index)? Does it work? >>>>> >>>>> >>>>> >>>>> Katta works but its just a means of distributing lucene indices. ?You >>> need >>>>> to make the indices first. ?You've checked out the BuildTableIndex >>>>> mapreduce >>>>> job in hbase? ?It indexes table contents. ?The index is sharded by the >>>>> number of reducers you run. ?Perhaps you can have Katta deploy this >>> product >>>>> for you? ?Perhaps the indices made are not what you want for secondary >>>>> lookups but you could adapt BuildTableIndex? >>>>> >>>>> Does the table change frequently? ?A batch job to redo the index is OK >>> with >>>>> you? ?In TRUNK you could run a scan that only found records created >>> after a >>>>> certain date so you could add incremental indices and then do the full >>>>> build >>>>> of the index at some lesser frequency. >>>>> >>>>> There is also the experimental tableindexed subclass of hbase that > will >>>>> keep >>>>> up a secondary table as an index using transactional hbase so insert >>> into >>>>> primary and secondary table is done as a single transaction (Its not >> yet >>> in >>>>> trunk but should be here soon). >>>>> >>>>> St.Ack >>>>> >>>>> >>>>>> We just want to transfer part of our Oracle table data to HBase >>>>>> for multi parallel computing. >>>>>> Any suggestions would be appreciated! >>>>>> Thank you >>>>>> >>>>>> Fleming >>>>>> >>>>>> >>>>> >>> >> > --------------------------------------------------------------------------- >>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? TSMC PROPERTY >>>>>> ?This email communication (and any attachments) is proprietary >>>>> information >>>>>> ?for the sole use of its >>>>>> ?intended recipient. Any unauthorized review, use or distribution by >>>>> anyone >>>>>> ?other than the intended >>>>>> ?recipient is strictly prohibited. ?If you are not the intended >>>>> recipient, >>>>>> ?please notify the sender by >>>>>> ?replying to this email, and then delete this email and any copies of >>> it >>>>>> ?immediately. Thank you. >>>>>> >>>>>> >>>>> >>> >> > --------------------------------------------------------------------------- >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >> > ?--------------------------------------------------------------------------- > >> >>> >>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? TSMC PROPERTY >>>>> ?This email communication (and any attachments) is proprietary >>> information >>>>> ?for the sole use of its >>>>> ?intended recipient. Any unauthorized review, use or distribution by >>> anyone >>>>> ?other than the intended >>>>> ?recipient is strictly prohibited. ?If you are not the intended >>> recipient, >>>>> ?please notify the sender by >>>>> ?replying to this email, and then delete this email and any copies of >> it >>>>> ?immediately. Thank you. >>>>> >> > ?--------------------------------------------------------------------------- > >> >>> >>>>> >>>>> >>>>> >>>>> >>>> >>> >>> >>> >>> >>> > ?--------------------------------------------------------------------------- > >> >>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? TSMC PROPERTY >>> ?This email communication (and any attachments) is proprietary >> information >>> ?for the sole use of its >>> ?intended recipient. Any unauthorized review, use or distribution by >> anyone >>> ?other than the intended >>> ?recipient is strictly prohibited. ?If you are not the intended >> recipient, >>> ?please notify the sender by >>> ?replying to this email, and then delete this email and any copies of it >>> ?immediately. Thank you. >>> > ?--------------------------------------------------------------------------- > >> >>> >>> >>> >>> >> >> >> >> >> ?--------------------------------------------------------------------------- > >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? TSMC PROPERTY >> ?This email communication (and any attachments) is proprietary > information >> ?for the sole use of its >> ?intended recipient. Any unauthorized review, use or distribution by > anyone >> ?other than the intended >> ?recipient is strictly prohibited. ?If you are not the intended > recipient, >> ?please notify the sender by >> ?replying to this email, and then delete this email and any copies of it >> ?immediately. Thank you. >> ?--------------------------------------------------------------------------- > >> >> >> >> > > > > > --------------------------------------------------------------------------- > TSMC PROPERTY > This email communication (and any attachments) is proprietary information > for the sole use of its > intended recipient. Any unauthorized review, use or distribution by anyone > other than the intended > recipient is strictly prohibited. If you are not the intended recipient, > please notify the sender by > replying to this email, and then delete this email and any copies of it > immediately. Thank you. > --------------------------------------------------------------------------- > > > >
