We have a bunch of rows that don't fit into memory when using some of the table design patterns we like to use on Accumulo. Having row-level isolation without requiring rows to fit in memory was important to us. However, this is not trivial, especially under failures.
The basic technique we use involves keeping a mutation counter for all active scans on a tablet, writing the mutation counter with entries in the in-memory map, and keeping all of the data we need to provide a snapshot isolation view for the existing scans. The tricky part here is that if a tablet server fails then the recovery of a tablet on another tablet server doesn't include a recovery of the list of active scans. The tablet server might decide to minor compact, and the data needed to provide the row-level snapshot-isolation view might be lost when the entries flow through the iterator tree. We allow for many ways of dealing with this isolation fault. The Scanner ignores it by default. Users can also turn on the isolation exception via Scanner.enableIsolation(), resulting in the possibility of an IsolationException (subclass of RuntimeException) being thrown by the ScannerIterator. The IsolatedScanner wraps a Scanner, enables isolation on that scanner, buffers rows on the client side (possibly on disk), and can handle the IsolationException by restarting at the beginning of a row. Handling isolation without buffering is also possible by using a checkpoint and restart design that propagates through the application code, so we wanted to support that behavior by letting applications handle the exception in their own way. Sorry about the lack of documentation! We'll get working on it. Adam On Wed, Dec 21, 2011 at 11:45 AM, Aaron Cordova <[email protected]> wrote: > I'm looking over the IsolatedScanner and wondering, since you've all > probably thought more about it than I, whether loading a row entirely into > memory is required to provide row isolation, or whether it simply makes it > easier to implement. > > The BigTable paper says it makes the rows in the memtable copy-on-write. > Does this imply copying the entire row into memory first? That would seem > to make read-modify-write operations simpler, but it doesn't seem a > necessary condition for just writes ... > > In the future, is the intention to provide row-isolation upon request (via > using the IsolatedScanner), thereby making non-atomic reads (via the > Scanner) the default? > > Aaron
