Yes, the coprocessors potentially cross RS boundaries. No, the index is not co-located with the main table. Take a look at the link I sent as that should be able to answer a lot of questions.
Thanks, James On Mon, Jan 20, 2014 at 11:03 AM, Michael Segel <michael_se...@hotmail.com>wrote: > James, > > Ok… > > Its been a while since we talked about this… > > While the index is in a separate table, is that table being split and > collocated with the main table? > > If you’re using the coprocessor to maintain the index, that would imply > you’re crossing RS boundaries if your index is truly orthogonal. > > Is this what you’re doing? > > On Jan 20, 2014, at 11:32 AM, James Taylor <jtay...@salesforce.com> wrote: > > > Mike, > > Yes, you're mistaken: > > - secondary indexes in Phoenix are orthogonal to the base table. They're > in > > a separate table ( > > http://phoenix.incubator.apache.org/secondary_indexing.html). > > - Phoenix has joins. They're in our master branch with a release > scheduled > > for next month > > - numeric strings? Not a use case for indexing numeric data? Have you > ever > > seen a number used as an ID? > > Thanks, > > James > > > > > > On Mon, Jan 20, 2014 at 8:50 AM, Michael Segel < > michael_se...@hotmail.com>wrote: > > > >> Indexes tend to be orthogonal to the base table, not to mention if > you’re > >> using an inverted table for an index, your index table would be much > >> thinner than your base table. > >> > >> Having said that, the solution proposed by Yu, Taylor and others only > >> works if you want to use the index to help on server side filtering and > >> misses the boat on the larger and broader picture of improving query > >> optimization and joins. > >> > >> HINT: Unless I am mistaken… until you treat the index as orthogonal to > the > >> base table, you will always lag performance of traditional MPP DWs like > >> Informix XPS. (Now part of IBM’s IM pillar ) > >> > >> In addition, until you fix coprocessors in general, you will have > >> scalability and performance issues. > >> (Note that you can write a coprocessor to create a sandbox and separate > >> the co-process from the RS jvm, however it would be better if it were > part > >> of the underlying coprocessor code. ) > >> > >> The current implementation makes joins worthless. > >> (Note that in prior discussions, Phoenix doesn’t do joins…) > >> Here’s why: > >> In order to do a join, if you use the proposed index, you have to first > >> reduce each index in to a single, sort ordered set. Then you can take > the > >> intersection of the index result sets. The final set would be in sort > >> order and a subset of the total rows. You can then fetch the rows and > still > >> do a server side filter before returning the ultimate result set. > >> > >> Its that first step of reducing each result set in to a single sort > >> ordered set that takes a lot of effort. > >> > >> > >> On a side note…. there’s been some mention of ordering floats. Again, > just > >> a word of caution… there isn’t a really strong use case for indexing > >> numeric data types. period. And to be very, very clear, there is a > >> distinction between numeric strings and numeric data types. > >> > >> -Mike > >> > >> PS. Because of my role as a consultant, I am very, very limited in what > I > >> can say and contribute. I don’t own my work product, my clients do. Take > >> what I say with a grain of salt. I’m just a skinny little boy from > >> Cleveland Ohio, come to chase your beers and drink your women… ;-) > >> > >> On Jan 9, 2014, at 10:48 AM, James Taylor <jtay...@salesforce.com> > wrote: > >> > >>> IMHO, it would be valuable if the design considered both a global > >>> indexing solution and a local indexing solution. Both are useful in > >>> different circumstances. The global indexing design plus the > >>> application integration points could be derived from Jesse's work with > >>> his reference implementation in Phoenix - the global indexing code has > >>> no Phoenix dependencies and clearly defined integration points. > >>> > >>> Thanks, > >>> James > >>> > >>> On Jan 9, 2014, at 6:36 AM, Jesse Yates <jesse.k.ya...@gmail.com> > wrote: > >>> > >>>> Yes, that was a big concern I had as well. > >>>> > >>>> It's not clear how that will work with a large number of indexes; if > >> people > >>>> have one index, they will want more than one. To not plan for that > seems > >>>> like an incomplete implementation to me. In a horizontally scalable > >> system > >>>> like HBase, lots of buddy region isn't going to work out well..* Once > we > >>>> have regions that cannot be collocated, the extra RPC time starts to > be > >> the > >>>> biggest factor (as the doc points out) and we are back to what Phoenix > >> is > >>>> already doing**. > >>>> > >>>> But I'm probably missing something here in what makes it different? > >>>> > >>>> For folks that haven't been following the issue some high-level "how > it > >> all > >>>> kinda works" would be helpful from the championing commiters; that's a > >> long > >>>> doc to get through and grok :). How similar is this to the work > >> currently > >>>> by the existing indexing implementations (huawei, Phoenix, ngdata)? > The > >> doc > >>>> doesn't really nail down the interactions, but instead just right in > >> after > >>>> describing why SI should be added. > >>>> > >>>> Agree this would be super useful, but don't want to waste too much > work > >>>> reinventing the wheel or doing the wrong thing. further, this impl > >> quickly > >>>> starts to lead down the query optimization path, which get HBase away > >> from > >>>> its core "be a great byte store". > >>>> > >>>> Like I said, I'm all for secondary indexes in HBase and think this is > a > >>>> great push. I don't mean to rain on any parades. > >>>> > >>>> - jesse > >>>> > >>>> * but a smart way to specify region collocation? That I can get behind > >> as > >>>> it would unify a couple different indexing impls (e.g Phoenix would > >>>> consider using it to help make indexing faster - RPCs do suck). > >>>> > >>>> ** for instance, the doc talks about how to implement indexing for > >>>> floats... That might be a default impl, but for use cases like Phoenix > >> this > >>>> would break all our current encodings. We handled this is the indexing > >> impl > >>>> by making the builder pluggable for different use cases to support > >>>> different encodings. I feel like a lot of the code for this kind of SI > >>>> impl is already in Phoenix and has been working and fast for several > >> months > >>>> now; it's surprisingly tricky, especially with the delete cases and > time > >>>> stamp manipulation issues. > >>>> > >>>> > >>>> On Thursday, January 9, 2014, Sudarshan Kadambi (BLOOMBERG/ 731 LEXIN) > >>>> wrote: > >>>> > >>>>> Could you explain how the 1-1 association between user and index > table > >>>>> regions is maintained. I wasn't able to understand fully from the > >> document. > >>>>> > >>>>> ----- Original Message ----- > >>>>> From: Ted Yu <dev@hbase.apache.org> > >>>>> To: dev@hbase.apache.org > >>>>> At: Jan 8, 2014 3:41:40 PM > >>>>> > >>>>> Hi, > >>>>> Secondary index support is a frequently requested feature. > >>>>> > >>>>> Please find the updated design doc here: > >>>>> > >>>>> > >> > https://issues.apache.org/jira/secure/attachment/12621909/SecondaryIndex%20Design_Updated_2.pdf > >>>>> > >>>>> HBASE-9203 is the umbrella JIRA. > >>>>> > >>>>> Implementation patch was attached to HBASE-10222 > >>>>> > >>>>> Thanks to Rajesh who works on this feature. > >>>>> > >>>>> Cheers > >>>>> > >>>> > >>>> > >>>> -- > >>>> ------------------- > >>>> Jesse Yates > >>>> @jesse_yates > >>>> jyates.github.com > >>> > >> > >> > >