On Thu, Jul 23, 2015 at 4:03 PM, Russ Weeks <[email protected]> wrote:
> Thanks very much Keith, that's very helpful. It's nice to start to see how > all the pieces fit together - I assume the counter you're referring to is > kvCount in MemKey. > yeap, kvCount is what I was referring to. > > Regards, > -Russ > > On Thu, Jul 23, 2015 at 10:19 AM Keith Turner <[email protected]> wrote: > > > On Wed, Jul 22, 2015 at 10:11 PM, Russ Weeks <[email protected]> > > wrote: > > > > > Thanks for your response, Keith. Your suggestion to implement paging by > > > refining the scan range makes a lot of sense. Maybe I'm just getting to > > > caught up in mirroring Titan's HBase adaptor, I wonder why they've > > > implemented it on the server-side. > > > > > > > I think that approach is at least O((C/B)^2) where C is # columns and B > is > > the batch size being brought back each time. > > > > > > > > > > I hadn't considered the IsolatedScanner, in fact I've never used it > > before. > > > Can I ask, what sort of black magic is happening in the Tablet servers > to > > > implement that isolation? Is it somehow snapshotting the tablet prior > to > > > running the scan? > > > > > > > Enabling isolation on a scanner ensures that data sources do not change > > while scanning a row. The scan uses the same set of files and iterator > > stack while scanning a row. For in memory data there is a counter for > each > > insert, using this counter a scan does not see data inserted after it > > obtained an iterator. > > > > In the case of a tablet server fault, isolation is not maintained across > > the fault. When isolation is enabled on a regular scanner it will > detect > > this and throw an isolation exception. When using the IsolatedScanner > it > > will buffer rows and only return the row if the entire row was read > without > > seeing an isolation exception. If the isolated scanner sees an > isolation > > exception it throws the current row away and starts over, reseeking its > > wrapped scanner to the beginning of the row. > > > > Below are some links that may be helpful. > > > > http://accumulo.apache.org/1.6/examples/isolation.html > > > http://accumulo.apache.org/1.6/accumulo_user_manual.html#_isolated_scanner > > > > The link below has some info that should be rolled into the user manual > if > > its not there. > > > > > > > https://github.com/apache/accumulo/blob/1.6.3/docs/src/main/resources/isolation.html > > > > > > > Regards, > > > -Russ > > > > > > On Wed, Jul 22, 2015 at 12:17 PM Keith Turner <[email protected]> > wrote: > > > > > > > On Wed, Jul 22, 2015 at 2:22 PM, Russ Weeks < > [email protected]> > > > > wrote: > > > > > > > > > Hey, folks, > > > > > > > > > > Any ideas how I might go about implementing a column pagination > > filter > > > > > similar to HBase's [1]? Translated to Accumulo, this would be an > > > iterator > > > > > that skips the first m columns in a row and returns the next n > > columns. > > > > > > > > > > The catch as far as I can tell is that Accumulo could re-seek the > > > > iterator > > > > > at any time, screwing up the internal count of how many columns > have > > > been > > > > > seen. I guess the only way to resolve that would be to force every > > seek > > > > to > > > > > start at the beginning of a row, and the filter logic would only > > pass a > > > > KV > > > > > pair if it's in both the pagination range and the seek range. > > > > > > > > > > > > > An iterator will not be reseeked unless it returns something. So > when > > > > skipping the 1st M columns of a row, the iterator would not be torn > > down > > > > and reseeked. However when returning the N columns, the iterator > could > > > be > > > > torn down and reseeked. > > > > > > > > Since you are working within a row, there are two ways to avoid this. > > > You > > > > can use an IsolatedScanner which will prevent the iterator from being > > > torn > > > > down within a row. Alternatively, you could wrap your special > > iterator > > > > with a WholeRowIterator. > > > > > > > > Curious, would seeking a scanner to the last row:column seen (non > > > > inclusive) and reading N column from the scanner work? > > > > > > > > > > > > > > > > > > This work is in the context of ACCUMULO-638 (and ATLAS-40) which > I'll > > > > take > > > > > ownership of as soon as I make a little more headway... > > > > > > > > > > 1: > > > > > > > > > > > > > > > > > > > > https://github.com/apache/hbase/blob/branch-1.0/hbase-client/src/main/java/org/apache/hadoop/hbase/filter/ColumnPaginationFilter.java > > > > > > > > > > > > > > >
