Re: Katta for secondary index?

tim robertson Tue, 23 Jun 2009 01:14:23 -0700

Hi Fleming

I am pretty much a novice at HBase, but I have asked a similar
question a while ago - the question was whether to to put the data in
the Lucene index or to index the keys only and then get the data with
a series of getByKey(...) operations.  It seems there are no hard and
fast rules for this, so I think it is worth trying what you propose.
It is certainly what we are playing with at the moment, but it is not
live.


Cheers,

Tim

2009/6/23  <[email protected]>:
> Hello Tim,
>
> I would like to do queries by range(maybe by date) or specific family
> column value.
> Build these  family column (as index column)  with primary key mapping that
> I can use
> these  family column value to locate its primary key, then I can use these
> key to query HBase.
> Is it the right way if I try to use  BuildTableIndex?
>
> Fleming
>
>
>
>
>                      tim robertson
>                      <timrobertson100@        To:      
> [email protected]
>                      gmail.com>               cc:      (bcc: Y_823910/TSMC)
>                                               Subject: Re: Katta for 
> secondary index?
>                      2009/06/23 03:45
>                      PM
>                      Please respond to
>                      hbase-user
>
>
>
>
>
>
> What kind of searches are you doing with the secondary indexes?  Will
> it be range queries for example or simply "give me all the records for
> this key"?
>
>
>
> On Tue, Jun 23, 2009 at 9:44 AM, tim robertson<[email protected]>
> wrote:
>> For build table index:
>>
>> ? ? ? ? ? ? ? ?BuildTableIndex bti = new BuildTableIndex();
>> ? ? ? ? ? ? ? ?JobConf conf = new JobConf(TestBuildLucene.class);
>> ? ? ? ? ? ? ? ?conf = bti.createJob(conf, 1, 1, "/tmp/lucene-hbase",
> "occurrence",
>> "raw:CatalogueNo");
>> ? ? ? ? ? ? ? ?try {
>> ? ? ? ? ? ? ? ? ? ? ? ?long time = System.currentTimeMillis();
>> ? ? ? ? ? ? ? ? ? ? ? ?System.out.println("Starting the job
> input[occurrence]
>> output[/tmp/lucene-hbase]");
>> ? ? ? ? ? ? ? ? ? ? ? ?JobClient.runJob(conf);
>> ? ? ? ? ? ? ? ? ? ? ? ?System.out.println("Finished in " +
>> (1+System.currentTimeMillis()-time)/1000 + " secs!");
>> ? ? ? ? ? ? ? ?} catch (IOException e) {
>> ? ? ? ? ? ? ? ? ? ? ? ?e.printStackTrace();
>> ? ? ? ? ? ? ? ?}
>>
>>
>> Cheers
>> Tim
>>
>>
>>
>>
>> On Tue, Jun 23, 2009 at 9:39 AM, <[email protected]> wrote:
>>> Hi,
>>>
>>> Is there any code snippet of how to use BuildTableIndex and
> IndexedTable?
>>> Thank you.
>>>
>>> Fleming
>>>
>>>
>>>
>>>
>>>
>>> ? ? ? ? ? ? ? ? ? ? [email protected]
>>> ? ? ? ? ? ? ? ? ? ? ?om
> To: ? ? [email protected]
>>> ? ? ? ? ? ? ? ? ? ? ?Sent by: ? ? ? ? ? ? ? ? cc: ? ? ?(bcc:
> Y_823910/TSMC)
>>> ? ? ? ? ? ? ? ? ? ? [email protected] ? ? ? ?Subject: Re: Katta for
> secondary index?
>>> ? ? ? ? ? ? ? ? ? ? ?om
>>>
>>>
>>> ? ? ? ? ? ? ? ? ? ? ?2009/06/23 01:39
>>> ? ? ? ? ? ? ? ? ? ? ?PM
>>> ? ? ? ? ? ? ? ? ? ? ?Please respond to
>>> ? ? ? ? ? ? ? ? ? ? ?hbase-user
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Jun 22, 2009 at 5:46 PM, <[email protected]> wrote:
>>>
>>>> Hi there,
>>>>
>>>> HBase access data only by key, right?
>>>> Anybody use HBase + Katta(for secondary index)? Does it work?
>>>
>>>
>>>
>>> Katta works but its just a means of distributing lucene indices. ?You
> need
>>> to make the indices first. ?You've checked out the BuildTableIndex
>>> mapreduce
>>> job in hbase? ?It indexes table contents. ?The index is sharded by the
>>> number of reducers you run. ?Perhaps you can have Katta deploy this
> product
>>> for you? ?Perhaps the indices made are not what you want for secondary
>>> lookups but you could adapt BuildTableIndex?
>>>
>>> Does the table change frequently? ?A batch job to redo the index is OK
> with
>>> you? ?In TRUNK you could run a scan that only found records created
> after a
>>> certain date so you could add incremental indices and then do the full
>>> build
>>> of the index at some lesser frequency.
>>>
>>> There is also the experimental tableindexed subclass of hbase that will
>>> keep
>>> up a secondary table as an index using transactional hbase so insert
> into
>>> primary and secondary table is done as a single transaction (Its not yet
> in
>>> trunk but should be here soon).
>>>
>>> St.Ack
>>>
>>>
>>>> We just want to transfer part of our Oracle table data to HBase
>>>> for multi parallel computing.
>>>> Any suggestions would be appreciated!
>>>> Thank you
>>>>
>>>> Fleming
>>>>
>>>>
>>>
> ---------------------------------------------------------------------------
>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? TSMC PROPERTY
>>>> ?This email communication (and any attachments) is proprietary
>>> information
>>>> ?for the sole use of its
>>>> ?intended recipient. Any unauthorized review, use or distribution by
>>> anyone
>>>> ?other than the intended
>>>> ?recipient is strictly prohibited. ?If you are not the intended
>>> recipient,
>>>> ?please notify the sender by
>>>> ?replying to this email, and then delete this email and any copies of
> it
>>>> ?immediately. Thank you.
>>>>
>>>>
>>>
> ---------------------------------------------------------------------------
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>>
>>> ?---------------------------------------------------------------------------
>
>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? TSMC PROPERTY
>>> ?This email communication (and any attachments) is proprietary
> information
>>> ?for the sole use of its
>>> ?intended recipient. Any unauthorized review, use or distribution by
> anyone
>>> ?other than the intended
>>> ?recipient is strictly prohibited. ?If you are not the intended
> recipient,
>>> ?please notify the sender by
>>> ?replying to this email, and then delete this email and any copies of it
>>> ?immediately. Thank you.
>>> ?---------------------------------------------------------------------------
>
>>>
>>>
>>>
>>>
>>
>
>
>
>
>  ---------------------------------------------------------------------------
>                                                         TSMC PROPERTY
>  This email communication (and any attachments) is proprietary information
>  for the sole use of its
>  intended recipient. Any unauthorized review, use or distribution by anyone
>  other than the intended
>  recipient is strictly prohibited.  If you are not the intended recipient,
>  please notify the sender by
>  replying to this email, and then delete this email and any copies of it
>  immediately. Thank you.
>  ---------------------------------------------------------------------------
>
>
>
>

Re: Katta for secondary index?

Reply via email to