Re: SQL layer over Accumulo?

James Taylor Fri, 16 May 2014 19:16:26 -0700

That sounds promising, Josh & William. Is there a performance penalty with
this approach (versus traversing the rows in row key order)?
Thanks,
James



On Fri, May 16, 2014 at 8:27 AM, Josh Elser <[email protected]> wrote:

> On 5/11/14, 12:22 AM, James Taylor wrote:
>
>> @William - it's entirely possible that my HBase terminology is not mapping
>> well to Accumulo terminology. If Accumulo has a capability not present in
>> HBase that'll handle this, that'd be great.
>>
>> In HBase terminology, by row I mean all of the key values across all
>> column
>> families with the same row key (Row ID in Accumulo?). So in HBase, it
>> doesn't work to store the index data in a separate column family for the
>> same row, because the rows are ordered according to the data table row
>> key.
>> We need the rows of an index to be ordered by the row key formed by the
>> indexed columns instead. Otherwise we have to re-sort the rows which is
>> more expensive than just doing a scan over the data table.
>>
>
> (sorry for the delay, still trying to stay on top of mail from the outage)
>
> I think I know what Bill is trying to get at here and it hinges on the
> fact that Accumulo doesn't require you to define the column families for a
> table up front (it has a default locality group which all colfams which
> don't have a locality group defined go into -- differs from HBase where
> locality group == colfam).
>
> Because of this, you can use the column family and qualifier to get the
> properly sorting index records instead of using the row key (assuming the
> row is just some bucket/partitioning element). Thus, you can co-locate
> index and data key-values within the same row if you're tricky enough with
> how you create the table. :)
>
>
>  With buddy regions, the two regions are from different tables with
>> different row key orders. All of the data from "D" for a given region is
>> contained in the buddy region for "I", but in a different order. We
>> equally
>> rely on the buddy region for "I" being in row key order according to the
>> indexed columns (as opposed to the row key order of the data table).
>>
>> Thanks,
>> James
>>
>>
>> On Sat, May 10, 2014 at 7:21 PM, William Slacum <
>> [email protected]> wrote:
>>
>>  So there may be a bit of confusion with storing index and data in the
>>> same
>>> row. By "row" I just mean the logical Accumulo unit, not a "row" as in
>>> "thing in my relational table." Synonyms for "row" in this scheme are
>>> "shard" and "document partition".
>>>
>>> You can store multiple documents and indices for those documents in
>>> different column families within the same row. You then have separate
>>> readers for the indices and document data ("sources" in Iterator terms).
>>> Point and range queries are still possible in this fashion, and are made
>>> even easier if there's another level that maps terms to
>>> rows/shards/partition. The wikisearch example is an (admittedly rough)
>>> implementation of this.
>>>
>>> I think looking at how "buddy" regions work may help clarify things,
>>> since
>>> I imagine it works similarly. If the coprocessor is just reading from a
>>> region "I", that that contains index data for only region "D", then that
>>> maps pretty well to an iterator scanning index data from a column family
>>> "I" and fetching documents from a column family "D".
>>>
>>>
>>>
>>> On Thu, May 8, 2014 at 1:09 AM, James Taylor <[email protected]>
>>> wrote:
>>>
>>>  Sorry for the delay in getting back to you - things got a bit crazy with
>>>> our graduation and HBaseCon happening at the same time.
>>>>
>>>> @Josh & Bill - r.e. Co-locating indices within the same row simplifies
>>>>
>>> this
>>>
>>>> a bit.
>>>> The secondary indexes need to be in row key order by the indexed
>>>> columns,
>>>> so co-locating them in the data table wouldn't allow the lookup and
>>>> range
>>>> scan abilities we'd need. The advantage of the index is that you don't
>>>>
>>> need
>>>
>>>> to look at all the data, but can do a point lookup or range scan based
>>>> on
>>>> the usage of the indexed columns in a query.
>>>>
>>>> @Josh - r.e. Assuming I understand properly, you don't need to be
>>>>
>>> cognizant
>>>
>>>> of the splits. You just specify the Ranges (where each Range is a start
>>>>
>>> key
>>>
>>>> and end key) and the Accumulo client API does the rest.
>>>>
>>>> Typically the Ranges are merge sorted on the client, so this might
>>>>
>>> require
>>>
>>>> an extension to the Accumulo client.
>>>>
>>>> r.e. Next steps.
>>>>
>>>> We'd definitely need an expert on the Accumulo side to proceed. I'm
>>>> happy
>>>> to help on the Phoenix side - I'll post a note on our dev list too to
>>>> see
>>>> if there are other folks interested as well. Given the similarities
>>>>
>>> between
>>>
>>>> Accumulo and HBase and the abstraction Phoenix already has in place, I
>>>> don't think the effort would be large to get something up and running.
>>>> Maybe a phased approach, would make sense: first with query support and
>>>> next with secondary index support?
>>>>
>>>> Not sure where this stacks up in terms of priority for you all. At
>>>> Salesforce, we saw a specific need for this with HBase, the "big data
>>>> store" on top of which we'd choose to standardize. We realized early on
>>>> that we'd never get the adoption we wanted without providing a
>>>> different,
>>>> more familiar programming model: namely SQL. Since we were targeting
>>>> supporting interactive web-based applications, anything map/reduce based
>>>> wasn't a fit which led us to create Phoenix. Perhaps there are members
>>>> in
>>>> your community in the same boat?
>>>>
>>>> Thanks,
>>>> James
>>>>
>>>>
>>>>
>>>> On Fri, May 2, 2014 at 1:44 PM, Josh Elser <[email protected]>
>>>> wrote:
>>>>
>>>>  On 5/1/14, 2:24 AM, James Taylor wrote:
>>>>>
>>>>>  Thanks for the explanations, Josh. This sounds very doable. Few more
>>>>>> comments inline below.
>>>>>>
>>>>>> James
>>>>>>
>>>>>>
>>>>>> On Wed, Apr 30, 2014 at 8:37 AM, Josh Elser <[email protected]>
>>>>>>
>>>>> wrote:
>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>> On 4/30/14, 3:33 AM, James Taylor wrote:
>>>>>>>
>>>>>>>   On Tue, Apr 29, 2014 at 11:57 AM, Josh Elser <[email protected]
>>>>>>> >
>>>>>>>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>    @Josh - it's less baked in than you'd think on the client where
>>>>>>>>
>>>>>>> the
>>>
>>>> query
>>>>>>>>
>>>>>>>>
>>>>>>>>>   parsing, compilation, optimization, and orchestration occurs. The
>>>>>>>>>
>>>>>>>>>> client/server interaction is hidden behind the
>>>>>>>>>>
>>>>>>>>> ConnectionQueryServices
>>>>
>>>>> interface, the scanning behind ResultIterator (in
>>>>>>>>>> particular ScanningResultIterator), the DML behind MutationState,
>>>>>>>>>>
>>>>>>>>> and
>>>>
>>>>> KeyValue interaction behind KeyValueBuilder. Yes, though, it would
>>>>>>>>>> require
>>>>>>>>>> some more abstraction, but probably not too bad, though. On the
>>>>>>>>>> server-side, the entry points would all be different and that's
>>>>>>>>>>
>>>>>>>>> where
>>>>
>>>>> I'd
>>>>>>>>>> need your insights for what's possible.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>   Definitely. I'm a little concerned about what's expected to be
>>>>>>>>>>
>>>>>>>>> provided
>>>>>>>>> by
>>>>>>>>> the "database" (HBase, Accumulo) as I believe HBase is a little
>>>>>>>>>
>>>>>>>> more
>>>
>>>> flexible in allowing writes internally where Accumulo has thus far
>>>>>>>>>
>>>>>>>> said
>>>>
>>>>> "you're gonna have a bad time".
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> Tell me more about what you mean by "allowing writes internally".
>>>>>>>>
>>>>>>>>
>>>>>>>>  Haha, sorry, that was a sufficiently ominous statement with
>>>>>>>
>>>>>> insufficient
>>>>
>>>>> context.
>>>>>>>
>>>>>>> For discussion sake, let's just say HBase coprocessors and Accumulo
>>>>>>> iterators are equivalent, purely in the scope of "running server-side
>>>>>>> code"
>>>>>>> (in the RegionServer/TabletServer). However, there is a notable
>>>>>>> difference
>>>>>>> in the pipeline where each of those are implemented.
>>>>>>>
>>>>>>> Coprocessors have built-in hooks that let you get updates on
>>>>>>> PUT/GET/DELETE/etc as well as pre and post each of those operations.
>>>>>>>
>>>>>> In
>>>
>>>> other words, they provide hooks at a "high database level".
>>>>>>>
>>>>>>> Iterators tend to be much closer to the data itself, only dealing
>>>>>>>
>>>>>> with
>>>
>>>> streams of data (other iterators stacked on one another). Iterators
>>>>>>> implement versioning, visibilities, and can even implement complex
>>>>>>> searches. The downside of this approach is that iterators lack any
>>>>>>>
>>>>>> means
>>>>
>>>>> to
>>>>>>> safely write data _outside of the sorted Key-Value pairs in the
>>>>>>>
>>>>>> tablet
>>>
>>>> currently being processed_. It's possible to make in tablet updates,
>>>>>>>
>>>>>> but
>>>>
>>>>> sorted order within a large tablet might make this difficult as well.
>>>>>>>
>>>>>>> This is why I was thinking percolator would be a better solution, as
>>>>>>>
>>>>>> it's
>>>>
>>>>> meant for handling updates like this server-side. However, I imagine
>>>>>>>
>>>>>> it
>>>
>>>> would be possible, in the short-term, to make some separate process
>>>>>>> between
>>>>>>> Phoenix and Accumulo which handles writes.
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> Another fallback might be to do global index maintenance on the
>>>>>>
>>>>> client.
>>>
>>>> It'd just be more expensive, especially if you want to handle
>>>>>>
>>>>> out-of-order
>>>>
>>>>> updates (which are particularly tricky, as you have to get multiple
>>>>>> versions of the rows to work out all the different scenarios here).
>>>>>>
>>>>>> A second fallback might be to support only local indexing. Does
>>>>>>
>>>>> Accumulo
>>>
>>>> have the concept of a "custom load balancer" that would allow you to
>>>>>> co-locate two regions from different tables? The local-index features
>>>>>>
>>>>> has
>>>>
>>>>> kind of driven some feature requests on that front for HBase - mainly
>>>>>> callbacks when a region is split or re-located. The rows of the local
>>>>>> index
>>>>>> are prefixed with the region start key to keep them together and
>>>>>>
>>>>> identify
>>>>
>>>>> them.
>>>>>>
>>>>>>
>>>>> Agreed with what Bill said. Co-locating indices within the same row
>>>>> simplifies this a bit, IMO.
>>>>>
>>>>>
>>>>> <snip/>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>>>>
>>>>>>>> There's not a lot of hard/fast requirements. Most of what Phoenix
>>>>>>>>
>>>>>>> does
>>>
>>>> is
>>>>>>>> to optimize performance by leveraging the capabilities of the
>>>>>>>>
>>>>>>> server.
>>>
>>>> In
>>>>
>>>>> terms of hard/fast requirements, these come to mind:
>>>>>>>> - data is returned in row key order from range scans
>>>>>>>> - a scan may set a start key/stop key to do a range scan
>>>>>>>> - a row key may be composed of arbitrary bytes
>>>>>>>> - a client may "pre-split" a table by providing the region
>>>>>>>>
>>>>>>> boundaries
>>>
>>>> at
>>>>
>>>>> table create time (we rely on this for salting to prevent
>>>>>>>>
>>>>>>> hotspotting:
>>>
>>>> http://phoenix.incubator.apache.org/salted.html).
>>>>>>>> - the client has access to the region boundaries of a table (this
>>>>>>>>
>>>>>>> allows
>>>>
>>>>> for better parallelization)
>>>>>>>> - the client may issue chunk up a scan into smaller, multiple scans
>>>>>>>>
>>>>>>> and
>>>>
>>>>> run
>>>>>>>> them in parallel
>>>>>>>> Some of these may be a bit squishy, as there may be existing
>>>>>>>>
>>>>>>> machinery
>>>
>>>> already in your client programming model that could be leverage. The
>>>>>>>> client
>>>>>>>> API of HBase, for example, does not provide the ability out of the
>>>>>>>>
>>>>>>> box
>>>
>>>> to
>>>>>>>> parallelize a scan, so this is something Phoenix had to add on top
>>>>>>>> (through
>>>>>>>> chunking up scans at or within region boundaries).
>>>>>>>>
>>>>>>>>
>>>>>>>>  All of these look fine. The Accumulo BatchScanner does that
>>>>>>> parallelization for you which is really nice (handling tablet
>>>>>>>
>>>>>> migration
>>>
>>>> and
>>>>>>> all that fun stuff transparently).
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> That's nice that Accumulo has this built-in. Does it allow the client
>>>>>>
>>>>> to
>>>
>>>> specify the split points for the scan in some way?
>>>>>>
>>>>>>
>>>>> Assuming I understand properly, you don't need to be cognizant of the
>>>>> splits. You just specify the Ranges (where each Range is a start key
>>>>>
>>>> and
>>>
>>>> end key) and the Accumulo client API does the rest. You can be
>>>>>
>>>> efficient
>>>
>>>> by
>>>>
>>>>> structuring your data so that you don't touch every tabletserver for
>>>>>
>>>> every
>>>>
>>>>> query -- this seems to be what's being suggested.
>>>>>
>>>>> <snip/>
>>>>>
>>>>> What do you think is next, James?
>>>>>
>>>>> I know I won't have a lot of time to devote into heavy development with
>>>>> what I've already signed up for in the next few months, but I'd still
>>>>>
>>>> like
>>>>
>>>>> to try to help out where possible. Is anyone else on the Accumulo side
>>>>> interested in getting involved?
>>>>>
>>>>>
>>>>
>>>
>>

Re: SQL layer over Accumulo?

Reply via email to