Re: SQL layer over Accumulo?

Josh Elser Tue, 29 Apr 2014 11:58:09 -0700

@Josh - it's less baked in than you'd think on the client where the query
parsing, compilation, optimization, and orchestration occurs. The
client/server interaction is hidden behind the ConnectionQueryServices
interface, the scanning behind ResultIterator (in
particular ScanningResultIterator), the DML behind MutationState, and
KeyValue interaction behind KeyValueBuilder. Yes, though, it would require
some more abstraction, but probably not too bad, though. On the
server-side, the entry points would all be different and that's where I'd
need your insights for what's possible.

Definitely. I'm a little concerned about what's expected to be providedby the "database" (HBase, Accumulo) as I believe HBase is a little moreflexible in allowing writes internally where Accumulo has thus far said"you're gonna have a bad time".

@Eric - I agree about having txn support (probably through snapshot
isolation) by controlling the timestamp, and then layering indexing on top
of that. That's where we're headed. But I wouldn't let that stop the effort
- it would just be layered on top of what's already there. FWIW, there's
another interesting indexing model that has been termed "local indexing"(
https://github.com/Huawei-Hadoop/hindex) which is being worked on right now
(should be available in either our 4.1 or 4.2 release). In this model, the
table data and index data are co-located on the same region server through
a kind of "buddy" region mechanism. The advantage is that you take no hit
at write time, as you're writing both the index and table data together.
Not sure how/if this would transfer over to the Accumulo world.

Interesting. Given that Accumulo doesn't have a fixed column familyschema, this might make index generation even easier (maybe "cleaner" isthe proper word). You could easily co-locate the indices with the data,given them a proper name.

Problem still exists that we don't have a solid way to do this solelyinside of Accumulo ATM. I'd imagine that if someone stepped up toimplement coprocessors, we'd be taking the route of a separate,standalone process (as opposed to in-RegionServer). Hypothetically, wecould do the same for Phoenix in the short-term.

Can you quantify what would be expected by Accumulo to integrate withPhoenix (maybe list what exactly is done inside of HBase at a highlevel?) so that we could give some more targeted ideas/feelings as towhat the level of work would be inside Accumulo?

TLDR? Let's continue in the JIRA?

Mailing list is fine by me for while we get this hashed out :). We canmove to Jira when we start getting into specifics.

Re: SQL layer over Accumulo?

Reply via email to