Observation from the cuckoo's nest..

Driving the pagination from the client wouldn't necessitate the IsolatedScanner, would it? That is, unless you want that stronger isolation. I couldn't think of a reason, but I wasn't sure if I just missed some finer point.

FWIW, my gut reaction was that trying to do the pagination at the server would be difficult and problematic with little net benefit (you're not actually reducing any data -- the client will get it all in the end). This also got me wondering if there's a good way we could enable things like this via the standard public API. Pagination is definitely generally reusable -- I wonder if there are more problems which could be fit into the same mold that could be expressed in some normal way.

Keith Turner wrote:
On Wed, Jul 22, 2015 at 10:11 PM, Russ Weeks<[email protected]>
wrote:

Thanks for your response, Keith. Your suggestion to implement paging by
refining the scan range makes a lot of sense. Maybe I'm just getting to
caught up in mirroring Titan's HBase adaptor, I wonder why they've
implemented it on the server-side.


I think that approach is at least O((C/B)^2) where C is # columns and B is
the batch size being brought back each time.


I hadn't considered the IsolatedScanner, in fact I've never used it before.
Can I ask, what sort of black magic is happening in the Tablet servers to
implement that isolation? Is it somehow snapshotting the tablet prior to
running the scan?


Enabling isolation on a scanner ensures that data sources do not change
while scanning a row.  The scan uses the same set of files and iterator
stack while scanning a row.  For in memory data there is a counter for each
insert, using this counter a scan does not see data inserted after it
obtained an iterator.

In the case of a tablet server fault, isolation is not maintained across
the fault.   When isolation is enabled on a regular scanner it will detect
this and throw an isolation exception.    When using the IsolatedScanner it
will buffer rows and only return the row if the entire row was read without
seeing an isolation exception.   If the isolated scanner sees an isolation
exception it throws the current row away and starts over, reseeking its
wrapped scanner to the beginning of the row.

Below are some links that may be helpful.

http://accumulo.apache.org/1.6/examples/isolation.html
http://accumulo.apache.org/1.6/accumulo_user_manual.html#_isolated_scanner

The link below has some info that should be rolled into the user manual if
its not there.

https://github.com/apache/accumulo/blob/1.6.3/docs/src/main/resources/isolation.html


Regards,
-Russ

On Wed, Jul 22, 2015 at 12:17 PM Keith Turner<[email protected]>  wrote:

On Wed, Jul 22, 2015 at 2:22 PM, Russ Weeks<[email protected]>
wrote:

Hey, folks,

Any ideas how I might go about implementing a column pagination filter
similar to HBase's [1]? Translated to Accumulo, this would be an
iterator
that skips the first m columns in a row and returns the next n columns.

The catch as far as I can tell is that Accumulo could re-seek the
iterator
at any time, screwing up the internal count of how many columns have
been
seen. I guess the only way to resolve that would be to force every seek
to
start at the beginning of a row, and the filter logic would only pass a
KV
pair if it's in both the pagination range and the seek range.

An iterator will not be reseeked unless it returns something.  So when
skipping the 1st M columns of a row, the iterator would not be torn down
and reseeked.  However when returning the N columns, the iterator could
be
torn down and reseeked.

Since you are working within a row, there are two ways to avoid this.
  You
can use an IsolatedScanner which will prevent the iterator from being
torn
down within a row.   Alternatively, you could wrap your special iterator
with a WholeRowIterator.

Curious, would seeking a scanner to the last row:column seen (non
inclusive) and reading N column from the scanner work?


This work is in the context of ACCUMULO-638 (and ATLAS-40) which I'll
take
ownership of as soon as I make a little more headway...

1:


https://github.com/apache/hbase/blob/branch-1.0/hbase-client/src/main/java/org/apache/hadoop/hbase/filter/ColumnPaginationFilter.java

Reply via email to