That sounds promising, Josh & William. Is there a performance penalty with this approach (versus traversing the rows in row key order)? Thanks, James
On Fri, May 16, 2014 at 8:27 AM, Josh Elser <[email protected]> wrote: > On 5/11/14, 12:22 AM, James Taylor wrote: > >> @William - it's entirely possible that my HBase terminology is not mapping >> well to Accumulo terminology. If Accumulo has a capability not present in >> HBase that'll handle this, that'd be great. >> >> In HBase terminology, by row I mean all of the key values across all >> column >> families with the same row key (Row ID in Accumulo?). So in HBase, it >> doesn't work to store the index data in a separate column family for the >> same row, because the rows are ordered according to the data table row >> key. >> We need the rows of an index to be ordered by the row key formed by the >> indexed columns instead. Otherwise we have to re-sort the rows which is >> more expensive than just doing a scan over the data table. >> > > (sorry for the delay, still trying to stay on top of mail from the outage) > > I think I know what Bill is trying to get at here and it hinges on the > fact that Accumulo doesn't require you to define the column families for a > table up front (it has a default locality group which all colfams which > don't have a locality group defined go into -- differs from HBase where > locality group == colfam). > > Because of this, you can use the column family and qualifier to get the > properly sorting index records instead of using the row key (assuming the > row is just some bucket/partitioning element). Thus, you can co-locate > index and data key-values within the same row if you're tricky enough with > how you create the table. :) > > > With buddy regions, the two regions are from different tables with >> different row key orders. All of the data from "D" for a given region is >> contained in the buddy region for "I", but in a different order. We >> equally >> rely on the buddy region for "I" being in row key order according to the >> indexed columns (as opposed to the row key order of the data table). >> >> Thanks, >> James >> >> >> On Sat, May 10, 2014 at 7:21 PM, William Slacum < >> [email protected]> wrote: >> >> So there may be a bit of confusion with storing index and data in the >>> same >>> row. By "row" I just mean the logical Accumulo unit, not a "row" as in >>> "thing in my relational table." Synonyms for "row" in this scheme are >>> "shard" and "document partition". >>> >>> You can store multiple documents and indices for those documents in >>> different column families within the same row. You then have separate >>> readers for the indices and document data ("sources" in Iterator terms). >>> Point and range queries are still possible in this fashion, and are made >>> even easier if there's another level that maps terms to >>> rows/shards/partition. The wikisearch example is an (admittedly rough) >>> implementation of this. >>> >>> I think looking at how "buddy" regions work may help clarify things, >>> since >>> I imagine it works similarly. If the coprocessor is just reading from a >>> region "I", that that contains index data for only region "D", then that >>> maps pretty well to an iterator scanning index data from a column family >>> "I" and fetching documents from a column family "D". >>> >>> >>> >>> On Thu, May 8, 2014 at 1:09 AM, James Taylor <[email protected]> >>> wrote: >>> >>> Sorry for the delay in getting back to you - things got a bit crazy with >>>> our graduation and HBaseCon happening at the same time. >>>> >>>> @Josh & Bill - r.e. Co-locating indices within the same row simplifies >>>> >>> this >>> >>>> a bit. >>>> The secondary indexes need to be in row key order by the indexed >>>> columns, >>>> so co-locating them in the data table wouldn't allow the lookup and >>>> range >>>> scan abilities we'd need. The advantage of the index is that you don't >>>> >>> need >>> >>>> to look at all the data, but can do a point lookup or range scan based >>>> on >>>> the usage of the indexed columns in a query. >>>> >>>> @Josh - r.e. Assuming I understand properly, you don't need to be >>>> >>> cognizant >>> >>>> of the splits. You just specify the Ranges (where each Range is a start >>>> >>> key >>> >>>> and end key) and the Accumulo client API does the rest. >>>> >>>> Typically the Ranges are merge sorted on the client, so this might >>>> >>> require >>> >>>> an extension to the Accumulo client. >>>> >>>> r.e. Next steps. >>>> >>>> We'd definitely need an expert on the Accumulo side to proceed. I'm >>>> happy >>>> to help on the Phoenix side - I'll post a note on our dev list too to >>>> see >>>> if there are other folks interested as well. Given the similarities >>>> >>> between >>> >>>> Accumulo and HBase and the abstraction Phoenix already has in place, I >>>> don't think the effort would be large to get something up and running. >>>> Maybe a phased approach, would make sense: first with query support and >>>> next with secondary index support? >>>> >>>> Not sure where this stacks up in terms of priority for you all. At >>>> Salesforce, we saw a specific need for this with HBase, the "big data >>>> store" on top of which we'd choose to standardize. We realized early on >>>> that we'd never get the adoption we wanted without providing a >>>> different, >>>> more familiar programming model: namely SQL. Since we were targeting >>>> supporting interactive web-based applications, anything map/reduce based >>>> wasn't a fit which led us to create Phoenix. Perhaps there are members >>>> in >>>> your community in the same boat? >>>> >>>> Thanks, >>>> James >>>> >>>> >>>> >>>> On Fri, May 2, 2014 at 1:44 PM, Josh Elser <[email protected]> >>>> wrote: >>>> >>>> On 5/1/14, 2:24 AM, James Taylor wrote: >>>>> >>>>> Thanks for the explanations, Josh. This sounds very doable. Few more >>>>>> comments inline below. >>>>>> >>>>>> James >>>>>> >>>>>> >>>>>> On Wed, Apr 30, 2014 at 8:37 AM, Josh Elser <[email protected]> >>>>>> >>>>> wrote: >>>> >>>>> >>>>>> >>>>>> >>>>>>> On 4/30/14, 3:33 AM, James Taylor wrote: >>>>>>> >>>>>>> On Tue, Apr 29, 2014 at 11:57 AM, Josh Elser <[email protected] >>>>>>> > >>>>>>> >>>>>>>> wrote: >>>>>>>> >>>>>>>> @Josh - it's less baked in than you'd think on the client where >>>>>>>> >>>>>>> the >>> >>>> query >>>>>>>> >>>>>>>> >>>>>>>>> parsing, compilation, optimization, and orchestration occurs. The >>>>>>>>> >>>>>>>>>> client/server interaction is hidden behind the >>>>>>>>>> >>>>>>>>> ConnectionQueryServices >>>> >>>>> interface, the scanning behind ResultIterator (in >>>>>>>>>> particular ScanningResultIterator), the DML behind MutationState, >>>>>>>>>> >>>>>>>>> and >>>> >>>>> KeyValue interaction behind KeyValueBuilder. Yes, though, it would >>>>>>>>>> require >>>>>>>>>> some more abstraction, but probably not too bad, though. On the >>>>>>>>>> server-side, the entry points would all be different and that's >>>>>>>>>> >>>>>>>>> where >>>> >>>>> I'd >>>>>>>>>> need your insights for what's possible. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Definitely. I'm a little concerned about what's expected to be >>>>>>>>>> >>>>>>>>> provided >>>>>>>>> by >>>>>>>>> the "database" (HBase, Accumulo) as I believe HBase is a little >>>>>>>>> >>>>>>>> more >>> >>>> flexible in allowing writes internally where Accumulo has thus far >>>>>>>>> >>>>>>>> said >>>> >>>>> "you're gonna have a bad time". >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> Tell me more about what you mean by "allowing writes internally". >>>>>>>> >>>>>>>> >>>>>>>> Haha, sorry, that was a sufficiently ominous statement with >>>>>>> >>>>>> insufficient >>>> >>>>> context. >>>>>>> >>>>>>> For discussion sake, let's just say HBase coprocessors and Accumulo >>>>>>> iterators are equivalent, purely in the scope of "running server-side >>>>>>> code" >>>>>>> (in the RegionServer/TabletServer). However, there is a notable >>>>>>> difference >>>>>>> in the pipeline where each of those are implemented. >>>>>>> >>>>>>> Coprocessors have built-in hooks that let you get updates on >>>>>>> PUT/GET/DELETE/etc as well as pre and post each of those operations. >>>>>>> >>>>>> In >>> >>>> other words, they provide hooks at a "high database level". >>>>>>> >>>>>>> Iterators tend to be much closer to the data itself, only dealing >>>>>>> >>>>>> with >>> >>>> streams of data (other iterators stacked on one another). Iterators >>>>>>> implement versioning, visibilities, and can even implement complex >>>>>>> searches. The downside of this approach is that iterators lack any >>>>>>> >>>>>> means >>>> >>>>> to >>>>>>> safely write data _outside of the sorted Key-Value pairs in the >>>>>>> >>>>>> tablet >>> >>>> currently being processed_. It's possible to make in tablet updates, >>>>>>> >>>>>> but >>>> >>>>> sorted order within a large tablet might make this difficult as well. >>>>>>> >>>>>>> This is why I was thinking percolator would be a better solution, as >>>>>>> >>>>>> it's >>>> >>>>> meant for handling updates like this server-side. However, I imagine >>>>>>> >>>>>> it >>> >>>> would be possible, in the short-term, to make some separate process >>>>>>> between >>>>>>> Phoenix and Accumulo which handles writes. >>>>>>> >>>>>>> >>>>>> >>>>>> Another fallback might be to do global index maintenance on the >>>>>> >>>>> client. >>> >>>> It'd just be more expensive, especially if you want to handle >>>>>> >>>>> out-of-order >>>> >>>>> updates (which are particularly tricky, as you have to get multiple >>>>>> versions of the rows to work out all the different scenarios here). >>>>>> >>>>>> A second fallback might be to support only local indexing. Does >>>>>> >>>>> Accumulo >>> >>>> have the concept of a "custom load balancer" that would allow you to >>>>>> co-locate two regions from different tables? The local-index features >>>>>> >>>>> has >>>> >>>>> kind of driven some feature requests on that front for HBase - mainly >>>>>> callbacks when a region is split or re-located. The rows of the local >>>>>> index >>>>>> are prefixed with the region start key to keep them together and >>>>>> >>>>> identify >>>> >>>>> them. >>>>>> >>>>>> >>>>> Agreed with what Bill said. Co-locating indices within the same row >>>>> simplifies this a bit, IMO. >>>>> >>>>> >>>>> <snip/> >>>>> >>>>> >>>>> >>>>> >>>>>>>>> >>>>>>>> There's not a lot of hard/fast requirements. Most of what Phoenix >>>>>>>> >>>>>>> does >>> >>>> is >>>>>>>> to optimize performance by leveraging the capabilities of the >>>>>>>> >>>>>>> server. >>> >>>> In >>>> >>>>> terms of hard/fast requirements, these come to mind: >>>>>>>> - data is returned in row key order from range scans >>>>>>>> - a scan may set a start key/stop key to do a range scan >>>>>>>> - a row key may be composed of arbitrary bytes >>>>>>>> - a client may "pre-split" a table by providing the region >>>>>>>> >>>>>>> boundaries >>> >>>> at >>>> >>>>> table create time (we rely on this for salting to prevent >>>>>>>> >>>>>>> hotspotting: >>> >>>> http://phoenix.incubator.apache.org/salted.html). >>>>>>>> - the client has access to the region boundaries of a table (this >>>>>>>> >>>>>>> allows >>>> >>>>> for better parallelization) >>>>>>>> - the client may issue chunk up a scan into smaller, multiple scans >>>>>>>> >>>>>>> and >>>> >>>>> run >>>>>>>> them in parallel >>>>>>>> Some of these may be a bit squishy, as there may be existing >>>>>>>> >>>>>>> machinery >>> >>>> already in your client programming model that could be leverage. The >>>>>>>> client >>>>>>>> API of HBase, for example, does not provide the ability out of the >>>>>>>> >>>>>>> box >>> >>>> to >>>>>>>> parallelize a scan, so this is something Phoenix had to add on top >>>>>>>> (through >>>>>>>> chunking up scans at or within region boundaries). >>>>>>>> >>>>>>>> >>>>>>>> All of these look fine. The Accumulo BatchScanner does that >>>>>>> parallelization for you which is really nice (handling tablet >>>>>>> >>>>>> migration >>> >>>> and >>>>>>> all that fun stuff transparently). >>>>>>> >>>>>>> >>>>>> >>>>>> That's nice that Accumulo has this built-in. Does it allow the client >>>>>> >>>>> to >>> >>>> specify the split points for the scan in some way? >>>>>> >>>>>> >>>>> Assuming I understand properly, you don't need to be cognizant of the >>>>> splits. You just specify the Ranges (where each Range is a start key >>>>> >>>> and >>> >>>> end key) and the Accumulo client API does the rest. You can be >>>>> >>>> efficient >>> >>>> by >>>> >>>>> structuring your data so that you don't touch every tabletserver for >>>>> >>>> every >>>> >>>>> query -- this seems to be what's being suggested. >>>>> >>>>> <snip/> >>>>> >>>>> What do you think is next, James? >>>>> >>>>> I know I won't have a lot of time to devote into heavy development with >>>>> what I've already signed up for in the next few months, but I'd still >>>>> >>>> like >>>> >>>>> to try to help out where possible. Is anyone else on the Accumulo side >>>>> interested in getting involved? >>>>> >>>>> >>>> >>> >>
