Thanks for ur explanation Gary , Consider my case where i can have repetitions of values .. So u say that i edit the IndexKeyGenerator in such a way that instead of storing (column->rowkey) i should do in such a way that (coulmn-> rowkey1,rowkey2) as diff timestamps ... if yes is that a good way ?
On Mon, Aug 17, 2009 at 10:53 PM, Gary Helmling <[email protected]> wrote: > When defining the IndexSpecification for your table, you can pass your > own implementation of > org.apache.hadoop.hbase.client.tableindexed.IndexKeyGenerator. > > This allows you to control how the row keys are generated for the > secondary index table. For example, you could append the original > table's row key to the indexed value to ensure uniqueness in > referencing the original rows. > > When you create an indexed scanner, the secondary index code opens and > wraps a scanner on the secondary index table, based on the start row > you specify (the indexed value you're looking up). It applies any > filter passed to rows on the secondary index table, so make sure > anything you want to filter on is listed in the "indexed columns" in > your IndexSpecification. > > For any rows returned by the wrapped scanner, the client code then > does a get for the original table record (the original row key is > stored in the "__INDEX__" column family I think). > > So in total, when using secondary indexes, you wind up with 1 scan + N > gets to look at N rows. > > At least, this was my understanding of how things worked as of 0.19. > I'm actually moving indexing into my app layer as I update to 0.20. > > Hope this helps. > > --gh > > > On Mon, Aug 17, 2009 at 1:00 PM, Jonathan Gray<[email protected]> wrote: > > I'm actually unsure about that. Look at the code or experiment. > > > > Seems to me that there would be a uniqueness requirement, otherwise what > do > > you expect the behavior to be? A get can only return a single row, so > > multiple index hits doesn't really make sense. > > > > Clint? You out there? :) > > > > JG > > > > bharath vissapragada wrote: > >> > >> I got it ... I think this is definitely useful in my app because iam > >> performing a full table scan everytime for selecting the rowkeys based > on > >> some column values . > >> > >> BUT .. > >> > >> we can have more than one rowkey for the same column value .Can you > >> please > >> tell me how they are stored . > >> > >> Thanks in advance > >> > >> On Mon, Aug 17, 2009 at 9:27 PM, Jonathan Gray <[email protected]> > wrote: > >> > >>> It's not an actual hash or btree index, but rather secondary indexes in > >>> HBase are implemented by creating an additional HBase table. > >>> > >>> If I have a table "users" (row key is userid) with family "data" and > >>> column > >>> "email", and I want to index the value in that column... > >>> > >>> I can create a table "users_email" where the row key is the email > address > >>> (value from the column in "users" table) and a single column that > >>> contains > >>> the userid. > >>> > >>> Doing an "index lookup" would mean doing a get on "users_email" and > then > >>> using that userid to do a lookup on the "users" table. > >>> > >>> IndexedTable does this transparently, but still does require two > queries. > >>> So it's slower than a single query, but certainly faster than a full > >>> table > >>> scan. > >>> > >>> If you need hash-level performance on the index lookup, there are lots > of > >>> solutions outside of HBase that would work... In-memory Java HashMap, > >>> Tokyo > >>> Cabinet on-disk HashMaps, BerkeleyDB, etc... If you need full-text > >>> indexing, > >>> you can use Lucene or the like. > >>> > >>> Make sense? > >>> > >>> JG > >>> > >>> > >>> bharath vissapragada wrote: > >>> > >>>> But i have read somewhere that Secondary indexes are somewhat slow > >>>> compared > >>>> to normal Hbase tables ..Does that effect the performance ? > >>>> > >>>> Also do you know the type of index created on the column(i mean Hash > >>>> type > >>>> or > >>>> Btree etc) > >>>> > >>>> On Mon, Aug 17, 2009 at 8:30 PM, Kirill Shabunov <[email protected]> > >>>> wrote: > >>>> > >>>> Hi! > >>>>> > >>>>> As far as I understand you are talking about the secondary indexes. > >>>>> Yes, > >>>>> they can be used to quickly get the rowkey by a value in the indexed > >>>>> column. > >>>>> > >>>>> --Kirill > >>>>> > >>>>> > >>>>> bharath vissapragada wrote: > >>>>> > >>>>> Hi all , > >>>>>> > >>>>>> I have gone through the IndexedTableAdmin classes in Hbase 0.19.3 > API > >>>>>> .. > >>>>>> I > >>>>>> have seen some methods used to create an Indexed Table (on some > >>>>>> column).. > >>>>>> I > >>>>>> have some doubts regarding the same ... > >>>>>> > >>>>>> 1) Are these somewhat similar to Hash indexes(in RDBMS) where i can > >>>>>> easily > >>>>>> lookup a column value and find it's corresponding rowkey(s) > >>>>>> 2) Can i find any performance gain when i use IndexedTable to search > >>>>>> for > >>>>>> a > >>>>>> paritcular column value .. instead of scanning an entire normal > HTable > >>>>>> .. > >>>>>> > >>>>>> Kindly clarify my doubts > >>>>>> > >>>>>> Thanks in advance > >>>>>> > >>>>>> > >>>>>> > >> > > >
