@Josh - it's less baked in than you'd think on the client where the query parsing, compilation, optimization, and orchestration occurs. The client/server interaction is hidden behind the ConnectionQueryServices interface, the scanning behind ResultIterator (in particular ScanningResultIterator), the DML behind MutationState, and KeyValue interaction behind KeyValueBuilder. Yes, though, it would require some more abstraction, but probably not too bad, though. On the server-side, the entry points would all be different and that's where I'd need your insights for what's possible.
Definitely. I'm a little concerned about what's expected to be provided by the "database" (HBase, Accumulo) as I believe HBase is a little more flexible in allowing writes internally where Accumulo has thus far said "you're gonna have a bad time".
@Eric - I agree about having txn support (probably through snapshot isolation) by controlling the timestamp, and then layering indexing on top of that. That's where we're headed. But I wouldn't let that stop the effort - it would just be layered on top of what's already there. FWIW, there's another interesting indexing model that has been termed "local indexing"( https://github.com/Huawei-Hadoop/hindex) which is being worked on right now (should be available in either our 4.1 or 4.2 release). In this model, the table data and index data are co-located on the same region server through a kind of "buddy" region mechanism. The advantage is that you take no hit at write time, as you're writing both the index and table data together. Not sure how/if this would transfer over to the Accumulo world.
Interesting. Given that Accumulo doesn't have a fixed column family schema, this might make index generation even easier (maybe "cleaner" is the proper word). You could easily co-locate the indices with the data, given them a proper name.
Problem still exists that we don't have a solid way to do this solely inside of Accumulo ATM. I'd imagine that if someone stepped up to implement coprocessors, we'd be taking the route of a separate, standalone process (as opposed to in-RegionServer). Hypothetically, we could do the same for Phoenix in the short-term.
Can you quantify what would be expected by Accumulo to integrate with Phoenix (maybe list what exactly is done inside of HBase at a high level?) so that we could give some more targeted ideas/feelings as to what the level of work would be inside Accumulo?
TLDR? Let's continue in the JIRA?
Mailing list is fine by me for while we get this hashed out :). We can move to Jira when we start getting into specifics.
