What is generally of interest? RLI or global level. I know it is based on usecase but is there a common need?
On Fri, Jan 3, 2014 at 4:31 PM, Anoop John <anoop.hb...@gmail.com> wrote: > A proportional difference in time taken, wrt increase in # RSs (keeping > No#rows matching values constant), would be what is of utmost interest. > > -Anoop- > > On Fri, Jan 3, 2014 at 3:49 PM, rajeshbabu chintaguntla < > rajeshbabu.chintagun...@huawei.com> wrote: > > > > > Here are some performance numbers with RLI. > > > > No Region servers : 4 > > Data per region : 2 GB > > > > Regions/RS| Total regions| Blocksize(kb) |No#rows matching values| Time > > taken(sec)| > > 50 | 200| 64|199|102 > > 50 | 200|8|199| 35 > > 100|400 | 8| 350| 95 > > 200| 800| 8| 353| 153 > > > > Without secondary index scan is taking in hours. > > > > > > Thanks, > > Rajeshbabu > > ________________________________________ > > From: Anoop John [anoop.hb...@gmail.com] > > Sent: Friday, January 03, 2014 3:22 PM > > To: user@hbase.apache.org > > Subject: Re: secondary index feature > > > > >Is there any data on how RLI (or in particular Phoenix) query > throughput > > correlates with the number of region servers assuming homogeneously > > distributed data? > > > > Phoenix is yet to add RLI. Now it is having global indexing only. Correct > > James? > > > > RLI impl from Huawei (HIndex) is having some numbers wrt regions.. But I > > doubt whether it is there large no# RSs. Do you have some data Rajesh > > Babu? > > > > -Anoop- > > > > On Fri, Jan 3, 2014 at 3:11 PM, Henning Blohm <henning.bl...@zfabrik.de > > >wrote: > > > > > Jesse, James, Lars, > > > > > > after looking around a bit and in particular looking into Phoenix > (which > > I > > > find very interesting), assuming that you want a secondary indexing on > > > HBASE without adding other infrastructure, there seems to be not a lot > of > > > choice really: Either go with a region-level (and co-processor based) > > > indexing feature (Phoenix, Huawei, is IHBase dead?) or add an index > table > > > to store (index value, entity key) pairs. > > > > > > The main concern I have with region-level indexing (RLI) is that Gets > > > potentially require to visit all regions. Compared to global index > tables > > > this seems to flatten the read-scalability curve of the cluster. In our > > > case, we have a large data set (hence HBASE) that will be queried > (mostly > > > point-gets via an index) in some linear correlation with its size. > > > > > > Is there any data on how RLI (or in particular Phoenix) query > throughput > > > correlates with the number of region servers assuming homogeneously > > > distributed data? > > > > > > Thanks, > > > Henning > > > > > > > > > > > > > > > On 24.12.2013 12:18, Henning Blohm wrote: > > > > > >> All that sounds very promising. I will give it a try and let you know > > >> how things worked out. > > >> > > >> Thanks, > > >> Henning > > >> > > >> On 12/23/2013 08:10 PM, Jesse Yates wrote: > > >> > > >>> The work that James is referencing grew out of the discussions Lars > > >>> and I > > >>> had (which lead to those blog posts). The solution we implement is > > >>> designed > > >>> to be generic, as James mentioned above, but was written with all the > > >>> hooks > > >>> necessary for Phoenix to do some really fast updates (or skipping > > updates > > >>> in the case where there is no change). > > >>> > > >>> You should be able to plug in your own simple index builder (there is > > >>> an example > > >>> in the phoenix codebase<https://github.com/forcedotcom/phoenix/tree/ > > >>> master/src/main/java/com/salesforce/hbase/index/covered/example>) > > >>> to basic solution which supports the same transactional guarantees as > > >>> HBase > > >>> (per row) + data guarantees across the index rows. There are more > > details > > >>> in the presentations James linked. > > >>> > > >>> I'd love you see if your implementation can fit into the framework we > > >>> wrote > > >>> - we would be happy to work to see if it needs some more hooks or > > >>> modifications - I have a feeling this is pretty much what you guys > will > > >>> need > > >>> > > >>> -Jesse > > >>> > > >>> > > >>> On Mon, Dec 23, 2013 at 10:01 AM, James Taylor< > jtay...@salesforce.com> > > >>> wrote: > > >>> > > >>> Henning, > > >>>> Jesse Yates wrote the back-end of our global secondary indexing > system > > >>>> in > > >>>> Phoenix. He designed it as a separate, pluggable module with no > > Phoenix > > >>>> dependencies. Here's an overview of the feature: > > >>>> https://github.com/forcedotcom/phoenix/wiki/Secondary-Indexing. The > > >>>> section that discusses the data guarantees and failure management > > might > > >>>> be > > >>>> of interest to you: > > >>>> > https://github.com/forcedotcom/phoenix/wiki/Secondary-Indexing#data- > > >>>> guarantees-and-failure-management > > >>>> > > >>>> This presentation also gives a good overview of the pluggability of > > his > > >>>> implementation: > > >>>> > http://files.meetup.com/1350427/PhoenixIndexing-SF-HUG_09-26-13.pptx > > >>>> > > >>>> Thanks, > > >>>> James > > >>>> > > >>>> > > >>>> On Mon, Dec 23, 2013 at 3:47 AM, Henning Blohm< > > henning.bl...@zfabrik.de > > >>>> >wrote: > > >>>> > > >>>> Lars, that is exactly why I am hesitant to use one the core level > > >>>>> generic > > >>>>> approaches (apart from having difficulties to identify the still > > active > > >>>>> projects): I have doubts I can sufficiently explain to myself when > > and > > >>>>> where they fail. > > >>>>> > > >>>>> With "toolbox approach" I meant to say that turning entity data > into > > >>>>> index data is not done generically but rather involving domain > > specific > > >>>>> application code that > > >>>>> > > >>>>> - indicates what makes an index key given an entity > > >>>>> - indicates whether an index entry is still valid given an entity > > >>>>> > > >>>>> That code is also used during the index rebuild and trimming (an > M/R > > >>>>> Job) > > >>>>> > > >>>>> So validating whether an index entry is valid means to load the > > entity > > >>>>> pointed to and - before considering it a valid result - validating > > >>>>> whether > > >>>>> values of the entity still match with the index. > > >>>>> > > >>>>> The entity is written last, hence when the client dies halfway > > through > > >>>>> the update you may get stale index entries but nothing else should > > >>>>> break. > > >>>>> > > >>>>> For scanning along the index, we are using a chunk iterator that > is, > > we > > >>>>> read n index entries ahead and then do point look ups for the > > >>>>> entities. How > > >>>>> would you avoid point-gets when scanning via an index (as most > > likely, > > >>>>> entities are ordered independently from the index - hence the > index)? > > >>>>> > > >>>>> Something really important to note is that there is no intention to > > >>>>> build > > >>>>> a completely generic solution, in particular not (this time - > unlike > > >>>>> the > > >>>>> other post of mine you responded to) taking row versioning into > > >>>>> account. > > >>>>> Instead, row time stamps are used to delete stale entries (old > > entries > > >>>>> after an index rebuild). > > >>>>> > > >>>>> Thanks a lot for your blog pointers. Haven't had time to study in > > depth > > >>>>> but at first glance there is lot of overlap of what you are > proposing > > >>>>> and > > >>>>> what I ended up doing considering the first post. > > >>>>> > > >>>>> On the second post: Indeed I have not worried too much about > > >>>>> transactional isolation of updates. If index update and entity > update > > >>>>> use > > >>>>> the same HBase time stamp, the result should at least be > consistent, > > >>>>> right? > > >>>>> > > >>>>> Btw. in no way am I claiming originality of my thoughts - in > > >>>>> particular I > > >>>>> readhttp://jyates.github.io/2012/07/09/consistent-enough- > > >>>>> > > >>>>> secondary-indexes.html a while back. > > >>>>> > > >>>>> Thanks, > > >>>>> Henning > > >>>>> > > >>>>> Ps.: I might write about this discussion later in my blog > > >>>>> > > >>>>> > > >>>>> On 22.12.2013 23:37, lars hofhansl wrote: > > >>>>> > > >>>>> The devil is often in the details. On the surface it looks simple. > > >>>>>> > > >>>>>> How specifically are the stale indexes ignored? Are the guaranteed > > to > > >>>>>> be > > >>>>>> no races? > > >>>>>> Is deletion handled correctly?Does it work with multiple versions? > > >>>>>> What happens when the client dies 1/2 way through an update? > > >>>>>> It's easy to do eventually consistent indexes. Truly consistent > > >>>>>> indexes > > >>>>>> without transactions are tricky. > > >>>>>> > > >>>>>> > > >>>>>> Also, scanning an index and then doing point-gets against a main > > table > > >>>>>> is slow (unless the index is very selective. The Phoenix team > > >>>>>> measured that > > >>>>>> there is only an advantage if the index filters out 98-99% of the > > >>>>>> data). > > >>>>>> So then one would revert to covered indexes and suddenly is not so > > >>>>>> easy > > >>>>>> to detect stale index entries. > > >>>>>> > > >>>>>> I blogged about these issues here: > > >>>>>> http://hadoop-hbase.blogspot.com/2012/10/musings-on- > > >>>>>> secondary-indexes.html > > >>>>>> http://hadoop-hbase.blogspot.com/2012/10/secondary-indexes- > > >>>>>> part-ii.html > > >>>>>> > > >>>>>> Phoenix has a (pretty involved) solution now that works around the > > >>>>>> fact > > >>>>>> that HBase has no transactions. > > >>>>>> > > >>>>>> > > >>>>>> -- Lars > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> ________________________________ > > >>>>>> From: Henning Blohm<henning.bl...@zfabrik.de> > > >>>>>> To: user<user@hbase.apache.org> > > >>>>>> Sent: Sunday, December 22, 2013 2:11 AM > > >>>>>> Subject: secondary index feature > > >>>>>> > > >>>>>> Lately we have added a secondary index feature to a persistence > tier > > >>>>>> over HBASE. Essentially we implemented what is described as > > >>>>>> "Dual-Write > > >>>>>> Secondary Index" inhttp://hbase.apache.org/ > > >>>>>> book/secondary.indexes.html. > > >>>>>> > > >>>>>> I.e. while updating an entity, actually before writing the actual > > >>>>>> update, indexes are updated. Lookup via the index ignores stale > > >>>>>> entries. > > >>>>>> A recurring rebuild and clean out of stale entries takes care the > > >>>>>> indexes are trimmed and accurate. > > >>>>>> > > >>>>>> None of this was terribly complex to implement. In fact, it seemed > > >>>>>> like > > >>>>>> something you could do generically, maybe not on the HBASE level > > >>>>>> itself, > > >>>>>> but as a toolbox / utility style library. > > >>>>>> > > >>>>>> Is anybody on the list aware of anything useful already existing > in > > >>>>>> that > > >>>>>> space? > > >>>>>> > > >>>>>> Thanks, > > >>>>>> Henning Blohm > > >>>>>> > > >>>>>> *ZFabrik Software KG* > > >>>>>> > > >>>>>> T: +49 6227 3984255< > > > https://mail.google.com/mail/u/0/html/compose/static_files/blank_quirks.html# > > > > > >>>>>> F: +49 6227 3984254< > > > https://mail.google.com/mail/u/0/html/compose/static_files/blank_quirks.html# > > > > > >>>>>> M: +49 1781891820< > > > https://mail.google.com/mail/u/0/html/compose/static_files/blank_quirks.html# > > > > > >>>>>> > > >>>>>> Lammstrasse 2 69190 Walldorf > > >>>>>> > > >>>>>> henning.bl...@zfabrik.de <mailto:henning.bl...@zfabrik.de> > > >>>>>> Linkedin<http://www.linkedin.com/pub/henning-blohm/0/7b5/628> > > >>>>>> ZFabrik<http://www.zfabrik.de> > > >>>>>> Blog<http://www.z2-environment.net/blog> > > >>>>>> Z2-Environment<http://www.z2-environment.eu> > > >>>>>> Z2 Wiki<http://redmine.z2-environment.net> > > >>>>>> > > >>>>>> -- > > >>>>> Henning Blohm > > >>>>> > > >>>>> *ZFabrik Software KG* > > >>>>> > > >>>>> T: +49 6227 3984255< > > > https://mail.google.com/mail/u/0/html/compose/static_files/blank_quirks.html# > > > > > >>>>> F: +49 6227 3984254< > > > https://mail.google.com/mail/u/0/html/compose/static_files/blank_quirks.html# > > > > > >>>>> M: +49 1781891820< > > > https://mail.google.com/mail/u/0/html/compose/static_files/blank_quirks.html# > > > > > >>>>> > > >>>>> Lammstrasse 2 69190 Walldorf > > >>>>> > > >>>>> henning.bl...@zfabrik.de <mailto:henning.bl...@zfabrik.de> > > >>>>> Linkedin<http://www.linkedin.com/pub/henning-blohm/0/7b5/628> > > >>>>> ZFabrik<http://www.zfabrik.de> > > >>>>> Blog<http://www.z2-environment.net/blog> > > >>>>> Z2-Environment<http://www.z2-environment.eu> > > >>>>> Z2 Wiki<http://redmine.z2-environment.net> > > >>>>> > > >>>>> > > >>>>> > > >> > > >> -- > > >> Henning Blohm > > >> > > >> *ZFabrik Software KG* > > >> > > >> T: +49 6227 3984255< > > > https://mail.google.com/mail/u/0/html/compose/static_files/blank_quirks.html# > > > > > >> F: +49 6227 3984254< > > > https://mail.google.com/mail/u/0/html/compose/static_files/blank_quirks.html# > > > > > >> M: +49 1781891820< > > > https://mail.google.com/mail/u/0/html/compose/static_files/blank_quirks.html# > > > > > >> > > >> Lammstrasse 2 69190 Walldorf > > >> > > >> henning.bl...@zfabrik.de <mailto:henning.bl...@zfabrik.de> > > >> Linkedin <http://www.linkedin.com/pub/henning-blohm/0/7b5/628> > > >> ZFabrik <http://www.zfabrik.de> > > >> Blog <http://www.z2-environment.net/blog> > > >> Z2-Environment <http://www.z2-environment.eu> > > >> Z2 Wiki <http://redmine.z2-environment.net> > > >> > > >> > > > > > > -- > > > Henning Blohm > > > > > > *ZFabrik Software KG* > > > > > > T: +49 6227 3984255< > > > https://mail.google.com/mail/u/0/html/compose/static_files/blank_quirks.html# > > > > > > F: +49 6227 3984254< > > > https://mail.google.com/mail/u/0/html/compose/static_files/blank_quirks.html# > > > > > > M: +49 1781891820< > > > https://mail.google.com/mail/u/0/html/compose/static_files/blank_quirks.html# > > > > > > > > > Lammstrasse 2 69190 Walldorf > > > > > > henning.bl...@zfabrik.de <mailto:henning.bl...@zfabrik.de> > > > Linkedin <http://www.linkedin.com/pub/henning-blohm/0/7b5/628> > > > ZFabrik <http://www.zfabrik.de> > > > Blog <http://www.z2-environment.net/blog> > > > Z2-Environment <http://www.z2-environment.eu> > > > Z2 Wiki <http://redmine.z2-environment.net> > > > > > > > > >