On 5/1/14, 2:24 AM, James Taylor wrote:
Thanks for the explanations, Josh. This sounds very doable. Few more
comments inline below.
James
On Wed, Apr 30, 2014 at 8:37 AM, Josh Elser <[email protected]> wrote:
On 4/30/14, 3:33 AM, James Taylor wrote:
On Tue, Apr 29, 2014 at 11:57 AM, Josh Elser <[email protected]>
wrote:
@Josh - it's less baked in than you'd think on the client where the query
parsing, compilation, optimization, and orchestration occurs. The
client/server interaction is hidden behind the ConnectionQueryServices
interface, the scanning behind ResultIterator (in
particular ScanningResultIterator), the DML behind MutationState, and
KeyValue interaction behind KeyValueBuilder. Yes, though, it would
require
some more abstraction, but probably not too bad, though. On the
server-side, the entry points would all be different and that's where
I'd
need your insights for what's possible.
Definitely. I'm a little concerned about what's expected to be provided
by
the "database" (HBase, Accumulo) as I believe HBase is a little more
flexible in allowing writes internally where Accumulo has thus far said
"you're gonna have a bad time".
Tell me more about what you mean by "allowing writes internally".
Haha, sorry, that was a sufficiently ominous statement with insufficient
context.
For discussion sake, let's just say HBase coprocessors and Accumulo
iterators are equivalent, purely in the scope of "running server-side code"
(in the RegionServer/TabletServer). However, there is a notable difference
in the pipeline where each of those are implemented.
Coprocessors have built-in hooks that let you get updates on
PUT/GET/DELETE/etc as well as pre and post each of those operations. In
other words, they provide hooks at a "high database level".
Iterators tend to be much closer to the data itself, only dealing with
streams of data (other iterators stacked on one another). Iterators
implement versioning, visibilities, and can even implement complex
searches. The downside of this approach is that iterators lack any means to
safely write data _outside of the sorted Key-Value pairs in the tablet
currently being processed_. It's possible to make in tablet updates, but
sorted order within a large tablet might make this difficult as well.
This is why I was thinking percolator would be a better solution, as it's
meant for handling updates like this server-side. However, I imagine it
would be possible, in the short-term, to make some separate process between
Phoenix and Accumulo which handles writes.
Another fallback might be to do global index maintenance on the client.
It'd just be more expensive, especially if you want to handle out-of-order
updates (which are particularly tricky, as you have to get multiple
versions of the rows to work out all the different scenarios here).
A second fallback might be to support only local indexing. Does Accumulo
have the concept of a "custom load balancer" that would allow you to
co-locate two regions from different tables? The local-index features has
kind of driven some feature requests on that front for HBase - mainly
callbacks when a region is split or re-located. The rows of the local index
are prefixed with the region start key to keep them together and identify
them.
Agreed with what Bill said. Co-locating indices within the same row
simplifies this a bit, IMO.
<snip/>
There's not a lot of hard/fast requirements. Most of what Phoenix does is
to optimize performance by leveraging the capabilities of the server. In
terms of hard/fast requirements, these come to mind:
- data is returned in row key order from range scans
- a scan may set a start key/stop key to do a range scan
- a row key may be composed of arbitrary bytes
- a client may "pre-split" a table by providing the region boundaries at
table create time (we rely on this for salting to prevent hotspotting:
http://phoenix.incubator.apache.org/salted.html).
- the client has access to the region boundaries of a table (this allows
for better parallelization)
- the client may issue chunk up a scan into smaller, multiple scans and
run
them in parallel
Some of these may be a bit squishy, as there may be existing machinery
already in your client programming model that could be leverage. The
client
API of HBase, for example, does not provide the ability out of the box to
parallelize a scan, so this is something Phoenix had to add on top
(through
chunking up scans at or within region boundaries).
All of these look fine. The Accumulo BatchScanner does that
parallelization for you which is really nice (handling tablet migration and
all that fun stuff transparently).
That's nice that Accumulo has this built-in. Does it allow the client to
specify the split points for the scan in some way?
Assuming I understand properly, you don't need to be cognizant of the
splits. You just specify the Ranges (where each Range is a start key and
end key) and the Accumulo client API does the rest. You can be efficient
by structuring your data so that you don't touch every tabletserver for
every query -- this seems to be what's being suggested.
<snip/>
What do you think is next, James?
I know I won't have a lot of time to devote into heavy development with
what I've already signed up for in the next few months, but I'd still
like to try to help out where possible. Is anyone else on the Accumulo
side interested in getting involved?