Re: Iterating/Aggregating/Combining Complex (Java POJO/Avro) Values
All, thanks again for your feedback. I just consolidated some of these learnings with some code samples here. http://www.mammothdatallc.com/blog/accumulo-in-depth-look-at-filters-combiners-iterators-against-complex-values/ Best, -Mike On Fri, Jul 18, 2014 at 11:54 AM, William Slacum wilhelm.von.cl...@accumulo.net wrote: Oh wow, I have totally read your problem incorrectly then. I thought you wanted a total count across rows for some reasoning (when you mentioned you had versioning turned off, things clicked). You can use a combiner, but I'd write an iterator that strips out the count field for each value (like we did the other iterator), and then place that lower in the iterator stack. This way you can get around your original issue with the combiner only taking a single input/output type. On Tue, Jul 15, 2014 at 2:25 PM, Adam Fuchs afu...@apache.org wrote: Mike, The way we usually aggregate by row is to check the source's top key within the next function to see if it breaks the row boundary. If your source starts giving you data in the next row then break out of the loop in the next function. You'll also need to construct a row key to return from your iterator and then handle the reseeking case (automatic seeking to second key in row). See the RowEncodingIterator for hints on implementation. You might actually want to subclass RowEncodingIterator to implement your counter. Cheers, Adam Cool. I'll write something up and share. I'm curious how to get my Counter (WrappingIterator) implementation to aggregate by row (which, for some reason, I assumed was default?) Let's say I have rows (and CF=, CQ= and versioningiterator off): 1 (Value1, Value 2...Value N) 2 3 How can my iterator return? 1 (Count of values 1..N) 2 (Count of values 1..N) 3 ... I tried scan -b 1 -e 1 and it counts an individual row. But if I don't specify anything, it returns, 3 (Count of all values across all rows) Code: http://pastebin.com/8xFNLHFS Example: root@dev pe listiter -scan -t pojo - -Iterator counter, scan scope options: -iteratorPriority = 10 -iteratorClassName = iterators.Counter - root@dev pe scan -b 1_1_20140101 -e 1_1_20140101 1_1_20140101 : [public]65 root@dev pe scan -b 1_1_20140101 -e 3_9_20140727 3_9_20140727 : [public]10 root@dev pe scan 3_9_20140727 : [public]10 Thanks. -Mike On Tue, Jul 15, 2014 at 12:29 PM, Josh Elser josh.el...@gmail.com wrote: There's been some mention about a desire to rethink the Iterator interface as it has some deficiencies (notably the lack of a cleanup before the iterators are torn down), but no one has stated that they're actively working on this. Getting better documentation wrt to convetions: let us know where the Accumulo documentation falls short (and give us patches to fix the documentation :D). Additionally, write up your own findings from problems that you've run into. It's the entire community (users specifically) that we need to help encourage to grow. Even things as simple as how do I count entries in an iterator are big as you are now an expert on the subject :) On 7/15/14, 12:17 PM, Michael Moss wrote: That worked ;) - Thanks! What a journey... I like Accumulo's architecture and promise, but the difficulty in querying it (lack of documentation, conventions) is a major concern and I'd imagine has to have an impact on adoption. I'm curious if there have been any conversations around changing the interface around iterators which are still confusing to me. Let me know how I can help! On Tue, Jul 15, 2014 at 12:03 PM, William Slacum wilhelm.von.cl...@accumulo.net mailto:wilhelm.von.cl...@accumulo.net wrote: Herp... serves me right for not setting up a proper test case. I think you need to override seek as well: @Override public void seek(...) throws IOException { super.seek(...); next(); } I think I just realized the wrapping iterator could use some clean up, because this isn't obvious. Basically after the wrapping iterator's seek is called, it never calls the implementor's next() to actually set up the first top key and value. On Tue, Jul 15, 2014 at 9:50 AM, Michael Moss michael.m...@gmail.com mailto:michael.m...@gmail.com wrote: I set up debugging and am rethrowing the exception. What's strange is it appears that despite the iterator instance being properly set to iterator.Counter (my implementation), my breakpoints aren't being hit, only in the parent classes (Wrapping Iterator) and (SortedKeyValueIterator). I have two rows in the table, when I scan with no iterator: 2014-07-15 06:46:26,577 [Audit ] INFO : operation: permitted; user: root; action: scan; targetTable: pojo; authorizations: public,; range: (-inf,+inf); columns: []; iterators: []; iteratorOptions:
Re: Iterating/Aggregating/Combining Complex (Java POJO/Avro) Values
I set up debugging and am rethrowing the exception. What's strange is it appears that despite the iterator instance being properly set to iterator.Counter (my implementation), my breakpoints aren't being hit, only in the parent classes (Wrapping Iterator) and (SortedKeyValueIterator). I have two rows in the table, when I scan with no iterator: 2014-07-15 06:46:26,577 [Audit ] INFO : operation: permitted; user: root; action: scan; targetTable: pojo; authorizations: public,; range: (-inf,+inf); columns: []; iterators: []; iteratorOptions: {}; 2014-07-15 06:46:26,589 [tserver.TabletServer] DEBUG: ScanSess tid 10.0.2.15:45073 8* 2 entries* in 0.01 secs, nbTimes = [7 7 7.00 1] When I scan with the iterator (0 entries?): 2014-07-15 06:45:58,036 [Audit ] INFO : operation: permitted; user: root; action: scan; targetTable: pojo; authorizations: public,; range: (-inf,+inf); columns: []; iterators: []; iteratorOptions: {}; 2014-07-15 06:45:58,047 [tserver.TabletServer] DEBUG: ScanSess tid 10.0.2.15:44992 8 *0 entries* in 0.01 secs, nbTimes = [6 6 6.00 1] No exceptions otherwise. Really appreciate all the ongoing help. Best, -Mike On Mon, Jul 14, 2014 at 6:40 PM, William Slacum wilhelm.von.cl...@accumulo.net wrote: Anything in your Tserver log? I think you should just rethrow that IOExcepton on your source's next() method, since they're usually not recoverable (ie, just make Counter#next throw IOException) On Mon, Jul 14, 2014 at 5:48 PM, Josh Elser josh.el...@gmail.com wrote: A quick sanity check is to make sure you have data in the table and that you can read the data without your iterator (I've thought I had a bug because I didn't have proper visibilities more times than I'd like to admit). Alternatively, you can also enable remote-debugging via Eclipse into the TabletServer which might help you understand more of what's going on. Lots of articles on how to set this up [1]. In short, add -Xdebug -Xrunjdwp:transport=dt_socket,server=y,address=8000 to ACCUMULO_TSERVER_OPTS in accumulo-env.sh, restart the tserver, connect eclipse to 8000 via the Debug configuration menu, set a breakpoint in your init, seek and next methods, and `scan` in the shell. [1] http://javarevisited.blogspot.com/2011/02/how-to-setup- remote-debugging-in.html On 7/14/14, 5:33 PM, Michael Moss wrote: Hmm...Still doesn't return anything from the shell. http://pastebin.com/ndRhspf8 Any thoughts? What's the best way to debug these? On Mon, Jul 14, 2014 at 5:14 PM, William Slacum wilhelm.von.cl...@accumulo.net mailto:wilhelm.von.cl...@accumulo.net wrote: Ah, an artifact of me just willy nilly writing an iterator :) Any reference to `this.source` should be replaced with `this.getSource()`. In `next()`, your workaround ends up calling `this.hasTop()` as the while loop condition. It will always return false because two lines up we set `top_key` to null. We need to make sure that the source iterator has a top, because we want to read data from it. We'll have to change the loop condition to `while(this.getSource().hasTop())`. On line 38 of your code we'll need to call `this.getSource().next()` instead of `this.next()`. The iterator interface is documented, but there hasn't been a definitive go-to for making one. I've been drafting a blog post, but since it doesn't exist yet, hopefully the following will suffice. The lifetime of an iterator is (usually) as follows: (1) A new instance is called via Class.newInstance (so a no-args constructor is needed) (2) Init is called. This allows users to configure the iterator, set its source, and possible check the environment. We can also call `deepCopy` on the source if we want to have multiple sources (we'd do this if we wanted to do a merge read out of multiple column families within a row). (3) seek() is called. This gets our readers to the correct positions in the data that are within the scan range the user requested, as well as turning column families on or off. The name should reminiscent of seeking to some key on disk. (4) hasTop() is called. If true, that means we have data, and the iterator has a key/value pair that can be retrieved by calling getTopKey() and getTopValue(). If fasle, we're done because there's no data to return. (5) next() is called. This will attempt find a new top key and value. We go back to (4) to see if next was successful in finding a new top key/value and will repeat until the client is satisfied or hasTop() returns false. You can kind of make a state machine out of those steps where we loop between (4) and (5) until there's no data. There are more advanced workflows where next() can be reading from multiple sources, as well as seeking them to different positions in the tablet. On Mon, Jul 14, 2014 at 4:51 PM, Michael Moss
Re: Iterating/Aggregating/Combining Complex (Java POJO/Avro) Values
Herp... serves me right for not setting up a proper test case. I think you need to override seek as well: @Override public void seek(...) throws IOException { super.seek(...); next(); } I think I just realized the wrapping iterator could use some clean up, because this isn't obvious. Basically after the wrapping iterator's seek is called, it never calls the implementor's next() to actually set up the first top key and value. On Tue, Jul 15, 2014 at 9:50 AM, Michael Moss michael.m...@gmail.com wrote: I set up debugging and am rethrowing the exception. What's strange is it appears that despite the iterator instance being properly set to iterator.Counter (my implementation), my breakpoints aren't being hit, only in the parent classes (Wrapping Iterator) and (SortedKeyValueIterator). I have two rows in the table, when I scan with no iterator: 2014-07-15 06:46:26,577 [Audit ] INFO : operation: permitted; user: root; action: scan; targetTable: pojo; authorizations: public,; range: (-inf,+inf); columns: []; iterators: []; iteratorOptions: {}; 2014-07-15 06:46:26,589 [tserver.TabletServer] DEBUG: ScanSess tid 10.0.2.15:45073 8* 2 entries* in 0.01 secs, nbTimes = [7 7 7.00 1] When I scan with the iterator (0 entries?): 2014-07-15 06:45:58,036 [Audit ] INFO : operation: permitted; user: root; action: scan; targetTable: pojo; authorizations: public,; range: (-inf,+inf); columns: []; iterators: []; iteratorOptions: {}; 2014-07-15 06:45:58,047 [tserver.TabletServer] DEBUG: ScanSess tid 10.0.2.15:44992 8 *0 entries* in 0.01 secs, nbTimes = [6 6 6.00 1] No exceptions otherwise. Really appreciate all the ongoing help. Best, -Mike On Mon, Jul 14, 2014 at 6:40 PM, William Slacum wilhelm.von.cl...@accumulo.net wrote: Anything in your Tserver log? I think you should just rethrow that IOExcepton on your source's next() method, since they're usually not recoverable (ie, just make Counter#next throw IOException) On Mon, Jul 14, 2014 at 5:48 PM, Josh Elser josh.el...@gmail.com wrote: A quick sanity check is to make sure you have data in the table and that you can read the data without your iterator (I've thought I had a bug because I didn't have proper visibilities more times than I'd like to admit). Alternatively, you can also enable remote-debugging via Eclipse into the TabletServer which might help you understand more of what's going on. Lots of articles on how to set this up [1]. In short, add -Xdebug -Xrunjdwp:transport=dt_socket,server=y,address=8000 to ACCUMULO_TSERVER_OPTS in accumulo-env.sh, restart the tserver, connect eclipse to 8000 via the Debug configuration menu, set a breakpoint in your init, seek and next methods, and `scan` in the shell. [1] http://javarevisited.blogspot.com/2011/02/how-to-setup- remote-debugging-in.html On 7/14/14, 5:33 PM, Michael Moss wrote: Hmm...Still doesn't return anything from the shell. http://pastebin.com/ndRhspf8 Any thoughts? What's the best way to debug these? On Mon, Jul 14, 2014 at 5:14 PM, William Slacum wilhelm.von.cl...@accumulo.net mailto:wilhelm.von.cl...@accumulo.net wrote: Ah, an artifact of me just willy nilly writing an iterator :) Any reference to `this.source` should be replaced with `this.getSource()`. In `next()`, your workaround ends up calling `this.hasTop()` as the while loop condition. It will always return false because two lines up we set `top_key` to null. We need to make sure that the source iterator has a top, because we want to read data from it. We'll have to change the loop condition to `while(this.getSource().hasTop())`. On line 38 of your code we'll need to call `this.getSource().next()` instead of `this.next()`. The iterator interface is documented, but there hasn't been a definitive go-to for making one. I've been drafting a blog post, but since it doesn't exist yet, hopefully the following will suffice. The lifetime of an iterator is (usually) as follows: (1) A new instance is called via Class.newInstance (so a no-args constructor is needed) (2) Init is called. This allows users to configure the iterator, set its source, and possible check the environment. We can also call `deepCopy` on the source if we want to have multiple sources (we'd do this if we wanted to do a merge read out of multiple column families within a row). (3) seek() is called. This gets our readers to the correct positions in the data that are within the scan range the user requested, as well as turning column families on or off. The name should reminiscent of seeking to some key on disk. (4) hasTop() is called. If true, that means we have data, and the iterator has a key/value pair that can be retrieved by calling getTopKey() and getTopValue(). If fasle, we're done because there's no data to return. (5) next() is called. This will attempt find a new
Re: Iterating/Aggregating/Combining Complex (Java POJO/Avro) Values
That worked ;) - Thanks! What a journey... I like Accumulo's architecture and promise, but the difficulty in querying it (lack of documentation, conventions) is a major concern and I'd imagine has to have an impact on adoption. I'm curious if there have been any conversations around changing the interface around iterators which are still confusing to me. Let me know how I can help! On Tue, Jul 15, 2014 at 12:03 PM, William Slacum wilhelm.von.cl...@accumulo.net wrote: Herp... serves me right for not setting up a proper test case. I think you need to override seek as well: @Override public void seek(...) throws IOException { super.seek(...); next(); } I think I just realized the wrapping iterator could use some clean up, because this isn't obvious. Basically after the wrapping iterator's seek is called, it never calls the implementor's next() to actually set up the first top key and value. On Tue, Jul 15, 2014 at 9:50 AM, Michael Moss michael.m...@gmail.com wrote: I set up debugging and am rethrowing the exception. What's strange is it appears that despite the iterator instance being properly set to iterator.Counter (my implementation), my breakpoints aren't being hit, only in the parent classes (Wrapping Iterator) and (SortedKeyValueIterator). I have two rows in the table, when I scan with no iterator: 2014-07-15 06:46:26,577 [Audit ] INFO : operation: permitted; user: root; action: scan; targetTable: pojo; authorizations: public,; range: (-inf,+inf); columns: []; iterators: []; iteratorOptions: {}; 2014-07-15 06:46:26,589 [tserver.TabletServer] DEBUG: ScanSess tid 10.0.2.15:45073 8* 2 entries* in 0.01 secs, nbTimes = [7 7 7.00 1] When I scan with the iterator (0 entries?): 2014-07-15 06:45:58,036 [Audit ] INFO : operation: permitted; user: root; action: scan; targetTable: pojo; authorizations: public,; range: (-inf,+inf); columns: []; iterators: []; iteratorOptions: {}; 2014-07-15 06:45:58,047 [tserver.TabletServer] DEBUG: ScanSess tid 10.0.2.15:44992 8 *0 entries* in 0.01 secs, nbTimes = [6 6 6.00 1] No exceptions otherwise. Really appreciate all the ongoing help. Best, -Mike On Mon, Jul 14, 2014 at 6:40 PM, William Slacum wilhelm.von.cl...@accumulo.net wrote: Anything in your Tserver log? I think you should just rethrow that IOExcepton on your source's next() method, since they're usually not recoverable (ie, just make Counter#next throw IOException) On Mon, Jul 14, 2014 at 5:48 PM, Josh Elser josh.el...@gmail.com wrote: A quick sanity check is to make sure you have data in the table and that you can read the data without your iterator (I've thought I had a bug because I didn't have proper visibilities more times than I'd like to admit). Alternatively, you can also enable remote-debugging via Eclipse into the TabletServer which might help you understand more of what's going on. Lots of articles on how to set this up [1]. In short, add -Xdebug -Xrunjdwp:transport=dt_socket,server=y,address=8000 to ACCUMULO_TSERVER_OPTS in accumulo-env.sh, restart the tserver, connect eclipse to 8000 via the Debug configuration menu, set a breakpoint in your init, seek and next methods, and `scan` in the shell. [1] http://javarevisited.blogspot.com/2011/02/how-to-setup- remote-debugging-in.html On 7/14/14, 5:33 PM, Michael Moss wrote: Hmm...Still doesn't return anything from the shell. http://pastebin.com/ndRhspf8 Any thoughts? What's the best way to debug these? On Mon, Jul 14, 2014 at 5:14 PM, William Slacum wilhelm.von.cl...@accumulo.net mailto:wilhelm.von.cl...@accumulo.net wrote: Ah, an artifact of me just willy nilly writing an iterator :) Any reference to `this.source` should be replaced with `this.getSource()`. In `next()`, your workaround ends up calling `this.hasTop()` as the while loop condition. It will always return false because two lines up we set `top_key` to null. We need to make sure that the source iterator has a top, because we want to read data from it. We'll have to change the loop condition to `while(this.getSource().hasTop())`. On line 38 of your code we'll need to call `this.getSource().next()` instead of `this.next()`. The iterator interface is documented, but there hasn't been a definitive go-to for making one. I've been drafting a blog post, but since it doesn't exist yet, hopefully the following will suffice. The lifetime of an iterator is (usually) as follows: (1) A new instance is called via Class.newInstance (so a no-args constructor is needed) (2) Init is called. This allows users to configure the iterator, set its source, and possible check the environment. We can also call `deepCopy` on the source if we want to have multiple sources (we'd do this if we wanted to do a merge read out of multiple column families within a row). (3) seek() is called. This gets our readers to the
Re: Iterating/Aggregating/Combining Complex (Java POJO/Avro) Values
There's been some mention about a desire to rethink the Iterator interface as it has some deficiencies (notably the lack of a cleanup before the iterators are torn down), but no one has stated that they're actively working on this. Getting better documentation wrt to convetions: let us know where the Accumulo documentation falls short (and give us patches to fix the documentation :D). Additionally, write up your own findings from problems that you've run into. It's the entire community (users specifically) that we need to help encourage to grow. Even things as simple as how do I count entries in an iterator are big as you are now an expert on the subject :) On 7/15/14, 12:17 PM, Michael Moss wrote: That worked ;) - Thanks! What a journey... I like Accumulo's architecture and promise, but the difficulty in querying it (lack of documentation, conventions) is a major concern and I'd imagine has to have an impact on adoption. I'm curious if there have been any conversations around changing the interface around iterators which are still confusing to me. Let me know how I can help! On Tue, Jul 15, 2014 at 12:03 PM, William Slacum wilhelm.von.cl...@accumulo.net mailto:wilhelm.von.cl...@accumulo.net wrote: Herp... serves me right for not setting up a proper test case. I think you need to override seek as well: @Override public void seek(...) throws IOException { super.seek(...); next(); } I think I just realized the wrapping iterator could use some clean up, because this isn't obvious. Basically after the wrapping iterator's seek is called, it never calls the implementor's next() to actually set up the first top key and value. On Tue, Jul 15, 2014 at 9:50 AM, Michael Moss michael.m...@gmail.com mailto:michael.m...@gmail.com wrote: I set up debugging and am rethrowing the exception. What's strange is it appears that despite the iterator instance being properly set to iterator.Counter (my implementation), my breakpoints aren't being hit, only in the parent classes (Wrapping Iterator) and (SortedKeyValueIterator). I have two rows in the table, when I scan with no iterator: 2014-07-15 06:46:26,577 [Audit ] INFO : operation: permitted; user: root; action: scan; targetTable: pojo; authorizations: public,; range: (-inf,+inf); columns: []; iterators: []; iteratorOptions: {}; 2014-07-15 06:46:26,589 [tserver.TabletServer] DEBUG: ScanSess tid 10.0.2.15:45073 http://10.0.2.15:45073 8*2 entries* in 0.01 secs, nbTimes = [7 7 7.00 1] When I scan with the iterator (0 entries?): 2014-07-15 06:45:58,036 [Audit ] INFO : operation: permitted; user: root; action: scan; targetTable: pojo; authorizations: public,; range: (-inf,+inf); columns: []; iterators: []; iteratorOptions: {}; 2014-07-15 06:45:58,047 [tserver.TabletServer] DEBUG: ScanSess tid 10.0.2.15:44992 http://10.0.2.15:44992 8 *0 entries* in 0.01 secs, nbTimes = [6 6 6.00 1] No exceptions otherwise. Really appreciate all the ongoing help. Best, -Mike On Mon, Jul 14, 2014 at 6:40 PM, William Slacum wilhelm.von.cl...@accumulo.net mailto:wilhelm.von.cl...@accumulo.net wrote: Anything in your Tserver log? I think you should just rethrow that IOExcepton on your source's next() method, since they're usually not recoverable (ie, just make Counter#next throw IOException) On Mon, Jul 14, 2014 at 5:48 PM, Josh Elser josh.el...@gmail.com mailto:josh.el...@gmail.com wrote: A quick sanity check is to make sure you have data in the table and that you can read the data without your iterator (I've thought I had a bug because I didn't have proper visibilities more times than I'd like to admit). Alternatively, you can also enable remote-debugging via Eclipse into the TabletServer which might help you understand more of what's going on. Lots of articles on how to set this up [1]. In short, add -Xdebug -Xrunjdwp:transport=dt_socket,__server=y,address=8000 to ACCUMULO_TSERVER_OPTS in accumulo-env.sh, restart the tserver, connect eclipse to 8000 via the Debug configuration menu, set a breakpoint in your init, seek and next methods, and `scan` in the shell. [1] http://javarevisited.blogspot.__com/2011/02/how-to-setup-__remote-debugging-in.html http://javarevisited.blogspot.com/2011/02/how-to-setup-remote-debugging-in.html On 7/14/14, 5:33 PM, Michael Moss wrote: Hmm...Still doesn't return anything from
Iterating/Aggregating/Combining Complex (Java POJO/Avro) Values
Hi, All. I'm curious what the best practices are around persisting complex types/data in Accumulo (and aggregating on fields within them). Let's say I have (row, column family, column qualifier, value): A foo MyHugeAvroObject(count=2) A foo MyHugeAvroObject(count=3) Let's say MyHugeAvroObject has a field Integer count with the values above. What is the best way to aggregate on row, column family, column qualifier by count? In my above example: A foo 5 The TypedValueCombiner.typedReduce method can deserialize any V, in my case MyHugeAvroObject, but it needs to return a value of type V. What are the best practices for deeply nested/complex objects? It's not always straightforward to map a complex Avro type into Row - Column Family - Column Qualifier. Rather than using a TypedCombiner, I looked into using an Aggregator (which appears deprecated as of 1.4), which appears to let me return arbitrary values, but despite running setiter, my aggregator doesn't seem to do anything. I also tried looking at implementing a WrappingIterator, which also appears to allow me to return arbitary values (such as Accumulo's CountingIterator), but I get cryptic errors when trying to setiter, I'm on Accumulo 1.6: root@dev kyt setiter -t kyt -scan -p 10 -n countingIter -class org.apache.accumulo.core.iterators.system.CountingIterator 2014-07-14 11:12:55,623 [shell.Shell] ERROR: java.lang.IllegalArgumentException: org.apache.accumulo.core.iterators.system.CountingIterator This is odd because other included implementations of WrappingIterator seem to work (perhaps the implementation of CountingIterator is dated): root@dev kyt setiter -t kyt -scan -p 10 -n deletingIterator -class org.apache.accumulo.core.iterators.system.DeletingIterator The iterator class does not implement OptionDescriber. Consider this for better iterator configuration using this setiter command. Name for iterator (enter to skip): All in all, how can I aggregate simple values, like counters from rows with complex Avro objects as Values without having to add aggregations fields to these Value objects? Thanks! -Mike
Re: Iterating/Aggregating/Combining Complex (Java POJO/Avro) Values
Hi Mike! The Combiner interface is only for aggregating keys within a single row. You can probably get away with implementing your combining logic in a WrappingIterator that reads across all the rows in a given tablet. To do some combine/fold/reduce operation, Accumulo needs the input type to be the same as the output type. The combiner doesn't have a notion of a present type (as you'd see in something like Algebird's Groups), but you can use another iterator to perform your transformation. If you wanted to extract the count field from your Avro object, you could write a new Iterator that took your Avro object, extracted the desired field, and returned it as its top value. You can then set this iterator as the source of the aggregator, either programmatically or via by wrapping the source object passed to the aggregator in its SortedKeyValueIterator#init call. This is a bit inefficient as you'd have to serialize to a Value and then immediately deserialize it in the iterator above it. You could mitigate this by exposing a method that would get the extracted value before serializing it. This kind of counting also requires client side logic to do a final combine operation, since the aggregations from all the tservers are partial results. I believe that CountingIterator is not meant for user consumption, but I do not know if it's related to your issue in trying to use it from the shell. Iterators set through the shell, in previous versions of Accumulo, have a requirement to implement OptionDescriber. Many default iterators do not implement this, and thus can't set in the shell. On Mon, Jul 14, 2014 at 2:44 PM, Michael Moss michael.m...@gmail.com wrote: Hi, All. I'm curious what the best practices are around persisting complex types/data in Accumulo (and aggregating on fields within them). Let's say I have (row, column family, column qualifier, value): A foo MyHugeAvroObject(count=2) A foo MyHugeAvroObject(count=3) Let's say MyHugeAvroObject has a field Integer count with the values above. What is the best way to aggregate on row, column family, column qualifier by count? In my above example: A foo 5 The TypedValueCombiner.typedReduce method can deserialize any V, in my case MyHugeAvroObject, but it needs to return a value of type V. What are the best practices for deeply nested/complex objects? It's not always straightforward to map a complex Avro type into Row - Column Family - Column Qualifier. Rather than using a TypedCombiner, I looked into using an Aggregator (which appears deprecated as of 1.4), which appears to let me return arbitrary values, but despite running setiter, my aggregator doesn't seem to do anything. I also tried looking at implementing a WrappingIterator, which also appears to allow me to return arbitary values (such as Accumulo's CountingIterator), but I get cryptic errors when trying to setiter, I'm on Accumulo 1.6: root@dev kyt setiter -t kyt -scan -p 10 -n countingIter -class org.apache.accumulo.core.iterators.system.CountingIterator 2014-07-14 11:12:55,623 [shell.Shell] ERROR: java.lang.IllegalArgumentException: org.apache.accumulo.core.iterators.system.CountingIterator This is odd because other included implementations of WrappingIterator seem to work (perhaps the implementation of CountingIterator is dated): root@dev kyt setiter -t kyt -scan -p 10 -n deletingIterator -class org.apache.accumulo.core.iterators.system.DeletingIterator The iterator class does not implement OptionDescriber. Consider this for better iterator configuration using this setiter command. Name for iterator (enter to skip): All in all, how can I aggregate simple values, like counters from rows with complex Avro objects as Values without having to add aggregations fields to these Value objects? Thanks! -Mike
Re: Iterating/Aggregating/Combining Complex (Java POJO/Avro) Values
For a bit of psuedocode, I'd probably make a class that did something akin to: http://pastebin.com/pKqAeeCR I wrote that up real quick in a text editor-- it won't compile or anything, but should point you in the right direction. On Mon, Jul 14, 2014 at 3:44 PM, William Slacum wilhelm.von.cl...@accumulo.net wrote: Hi Mike! The Combiner interface is only for aggregating keys within a single row. You can probably get away with implementing your combining logic in a WrappingIterator that reads across all the rows in a given tablet. To do some combine/fold/reduce operation, Accumulo needs the input type to be the same as the output type. The combiner doesn't have a notion of a present type (as you'd see in something like Algebird's Groups), but you can use another iterator to perform your transformation. If you wanted to extract the count field from your Avro object, you could write a new Iterator that took your Avro object, extracted the desired field, and returned it as its top value. You can then set this iterator as the source of the aggregator, either programmatically or via by wrapping the source object passed to the aggregator in its SortedKeyValueIterator#init call. This is a bit inefficient as you'd have to serialize to a Value and then immediately deserialize it in the iterator above it. You could mitigate this by exposing a method that would get the extracted value before serializing it. This kind of counting also requires client side logic to do a final combine operation, since the aggregations from all the tservers are partial results. I believe that CountingIterator is not meant for user consumption, but I do not know if it's related to your issue in trying to use it from the shell. Iterators set through the shell, in previous versions of Accumulo, have a requirement to implement OptionDescriber. Many default iterators do not implement this, and thus can't set in the shell. On Mon, Jul 14, 2014 at 2:44 PM, Michael Moss michael.m...@gmail.com wrote: Hi, All. I'm curious what the best practices are around persisting complex types/data in Accumulo (and aggregating on fields within them). Let's say I have (row, column family, column qualifier, value): A foo MyHugeAvroObject(count=2) A foo MyHugeAvroObject(count=3) Let's say MyHugeAvroObject has a field Integer count with the values above. What is the best way to aggregate on row, column family, column qualifier by count? In my above example: A foo 5 The TypedValueCombiner.typedReduce method can deserialize any V, in my case MyHugeAvroObject, but it needs to return a value of type V. What are the best practices for deeply nested/complex objects? It's not always straightforward to map a complex Avro type into Row - Column Family - Column Qualifier. Rather than using a TypedCombiner, I looked into using an Aggregator (which appears deprecated as of 1.4), which appears to let me return arbitrary values, but despite running setiter, my aggregator doesn't seem to do anything. I also tried looking at implementing a WrappingIterator, which also appears to allow me to return arbitary values (such as Accumulo's CountingIterator), but I get cryptic errors when trying to setiter, I'm on Accumulo 1.6: root@dev kyt setiter -t kyt -scan -p 10 -n countingIter -class org.apache.accumulo.core.iterators.system.CountingIterator 2014-07-14 11:12:55,623 [shell.Shell] ERROR: java.lang.IllegalArgumentException: org.apache.accumulo.core.iterators.system.CountingIterator This is odd because other included implementations of WrappingIterator seem to work (perhaps the implementation of CountingIterator is dated): root@dev kyt setiter -t kyt -scan -p 10 -n deletingIterator -class org.apache.accumulo.core.iterators.system.DeletingIterator The iterator class does not implement OptionDescriber. Consider this for better iterator configuration using this setiter command. Name for iterator (enter to skip): All in all, how can I aggregate simple values, like counters from rows with complex Avro objects as Values without having to add aggregations fields to these Value objects? Thanks! -Mike
Re: Iterating/Aggregating/Combining Complex (Java POJO/Avro) Values
Thanks, William. I was just hitting you up for an example :) I adapted your pseudocode (http://pastebin.com/ufPJq0g3), but noticed that this.source in your example didn't have visibility. Did I worked around it correctly? When I add my iterator to my table and run scan from the shell, it returns nothing - what should I expect here? In general I've found the iterator interface pretty confusing and haven't spent the time wrapping my head around it yet. Any documentation or examples (beyond what I could find on the site or in the code) appreciated! *root@dev table pojo* *root@dev pojo listiter -scan -t pojo* *-* *-Iterator counter, scan scope options:* *-iteratorPriority = 10* *-iteratorClassName = iterators.Counter* *-* *root@dev pojo scan* *root@dev pojo* Best, -Mike On Mon, Jul 14, 2014 at 4:07 PM, William Slacum wilhelm.von.cl...@accumulo.net wrote: For a bit of psuedocode, I'd probably make a class that did something akin to: http://pastebin.com/pKqAeeCR I wrote that up real quick in a text editor-- it won't compile or anything, but should point you in the right direction. On Mon, Jul 14, 2014 at 3:44 PM, William Slacum wilhelm.von.cl...@accumulo.net wrote: Hi Mike! The Combiner interface is only for aggregating keys within a single row. You can probably get away with implementing your combining logic in a WrappingIterator that reads across all the rows in a given tablet. To do some combine/fold/reduce operation, Accumulo needs the input type to be the same as the output type. The combiner doesn't have a notion of a present type (as you'd see in something like Algebird's Groups), but you can use another iterator to perform your transformation. If you wanted to extract the count field from your Avro object, you could write a new Iterator that took your Avro object, extracted the desired field, and returned it as its top value. You can then set this iterator as the source of the aggregator, either programmatically or via by wrapping the source object passed to the aggregator in its SortedKeyValueIterator#init call. This is a bit inefficient as you'd have to serialize to a Value and then immediately deserialize it in the iterator above it. You could mitigate this by exposing a method that would get the extracted value before serializing it. This kind of counting also requires client side logic to do a final combine operation, since the aggregations from all the tservers are partial results. I believe that CountingIterator is not meant for user consumption, but I do not know if it's related to your issue in trying to use it from the shell. Iterators set through the shell, in previous versions of Accumulo, have a requirement to implement OptionDescriber. Many default iterators do not implement this, and thus can't set in the shell. On Mon, Jul 14, 2014 at 2:44 PM, Michael Moss michael.m...@gmail.com wrote: Hi, All. I'm curious what the best practices are around persisting complex types/data in Accumulo (and aggregating on fields within them). Let's say I have (row, column family, column qualifier, value): A foo MyHugeAvroObject(count=2) A foo MyHugeAvroObject(count=3) Let's say MyHugeAvroObject has a field Integer count with the values above. What is the best way to aggregate on row, column family, column qualifier by count? In my above example: A foo 5 The TypedValueCombiner.typedReduce method can deserialize any V, in my case MyHugeAvroObject, but it needs to return a value of type V. What are the best practices for deeply nested/complex objects? It's not always straightforward to map a complex Avro type into Row - Column Family - Column Qualifier. Rather than using a TypedCombiner, I looked into using an Aggregator (which appears deprecated as of 1.4), which appears to let me return arbitrary values, but despite running setiter, my aggregator doesn't seem to do anything. I also tried looking at implementing a WrappingIterator, which also appears to allow me to return arbitary values (such as Accumulo's CountingIterator), but I get cryptic errors when trying to setiter, I'm on Accumulo 1.6: root@dev kyt setiter -t kyt -scan -p 10 -n countingIter -class org.apache.accumulo.core.iterators.system.CountingIterator 2014-07-14 11:12:55,623 [shell.Shell] ERROR: java.lang.IllegalArgumentException: org.apache.accumulo.core.iterators.system.CountingIterator This is odd because other included implementations of WrappingIterator seem to work (perhaps the implementation of CountingIterator is dated): root@dev kyt setiter -t kyt -scan -p 10 -n deletingIterator -class org.apache.accumulo.core.iterators.system.DeletingIterator The iterator class does not implement OptionDescriber. Consider this for better iterator configuration using this setiter command. Name for iterator (enter to skip): All in all, how can I aggregate simple values, like counters from rows with complex Avro objects as
Re: Iterating/Aggregating/Combining Complex (Java POJO/Avro) Values
Ah, an artifact of me just willy nilly writing an iterator :) Any reference to `this.source` should be replaced with `this.getSource()`. In `next()`, your workaround ends up calling `this.hasTop()` as the while loop condition. It will always return false because two lines up we set `top_key` to null. We need to make sure that the source iterator has a top, because we want to read data from it. We'll have to change the loop condition to `while(this.getSource().hasTop())`. On line 38 of your code we'll need to call `this.getSource().next()` instead of `this.next()`. The iterator interface is documented, but there hasn't been a definitive go-to for making one. I've been drafting a blog post, but since it doesn't exist yet, hopefully the following will suffice. The lifetime of an iterator is (usually) as follows: (1) A new instance is called via Class.newInstance (so a no-args constructor is needed) (2) Init is called. This allows users to configure the iterator, set its source, and possible check the environment. We can also call `deepCopy` on the source if we want to have multiple sources (we'd do this if we wanted to do a merge read out of multiple column families within a row). (3) seek() is called. This gets our readers to the correct positions in the data that are within the scan range the user requested, as well as turning column families on or off. The name should reminiscent of seeking to some key on disk. (4) hasTop() is called. If true, that means we have data, and the iterator has a key/value pair that can be retrieved by calling getTopKey() and getTopValue(). If fasle, we're done because there's no data to return. (5) next() is called. This will attempt find a new top key and value. We go back to (4) to see if next was successful in finding a new top key/value and will repeat until the client is satisfied or hasTop() returns false. You can kind of make a state machine out of those steps where we loop between (4) and (5) until there's no data. There are more advanced workflows where next() can be reading from multiple sources, as well as seeking them to different positions in the tablet. On Mon, Jul 14, 2014 at 4:51 PM, Michael Moss michael.m...@gmail.com wrote: Thanks, William. I was just hitting you up for an example :) I adapted your pseudocode (http://pastebin.com/ufPJq0g3), but noticed that this.source in your example didn't have visibility. Did I worked around it correctly? When I add my iterator to my table and run scan from the shell, it returns nothing - what should I expect here? In general I've found the iterator interface pretty confusing and haven't spent the time wrapping my head around it yet. Any documentation or examples (beyond what I could find on the site or in the code) appreciated! *root@dev table pojo* *root@dev pojo listiter -scan -t pojo* *-* *-Iterator counter, scan scope options:* *-iteratorPriority = 10* *-iteratorClassName = iterators.Counter* *-* *root@dev pojo scan* *root@dev pojo* Best, -Mike On Mon, Jul 14, 2014 at 4:07 PM, William Slacum wilhelm.von.cl...@accumulo.net wrote: For a bit of psuedocode, I'd probably make a class that did something akin to: http://pastebin.com/pKqAeeCR I wrote that up real quick in a text editor-- it won't compile or anything, but should point you in the right direction. On Mon, Jul 14, 2014 at 3:44 PM, William Slacum wilhelm.von.cl...@accumulo.net wrote: Hi Mike! The Combiner interface is only for aggregating keys within a single row. You can probably get away with implementing your combining logic in a WrappingIterator that reads across all the rows in a given tablet. To do some combine/fold/reduce operation, Accumulo needs the input type to be the same as the output type. The combiner doesn't have a notion of a present type (as you'd see in something like Algebird's Groups), but you can use another iterator to perform your transformation. If you wanted to extract the count field from your Avro object, you could write a new Iterator that took your Avro object, extracted the desired field, and returned it as its top value. You can then set this iterator as the source of the aggregator, either programmatically or via by wrapping the source object passed to the aggregator in its SortedKeyValueIterator#init call. This is a bit inefficient as you'd have to serialize to a Value and then immediately deserialize it in the iterator above it. You could mitigate this by exposing a method that would get the extracted value before serializing it. This kind of counting also requires client side logic to do a final combine operation, since the aggregations from all the tservers are partial results. I believe that CountingIterator is not meant for user consumption, but I do not know if it's related to your issue in trying to use it from the shell. Iterators set through the shell, in previous versions of Accumulo, have a requirement to implement
Re: Iterating/Aggregating/Combining Complex (Java POJO/Avro) Values
Hmm...Still doesn't return anything from the shell. http://pastebin.com/ndRhspf8 Any thoughts? What's the best way to debug these? On Mon, Jul 14, 2014 at 5:14 PM, William Slacum wilhelm.von.cl...@accumulo.net wrote: Ah, an artifact of me just willy nilly writing an iterator :) Any reference to `this.source` should be replaced with `this.getSource()`. In `next()`, your workaround ends up calling `this.hasTop()` as the while loop condition. It will always return false because two lines up we set `top_key` to null. We need to make sure that the source iterator has a top, because we want to read data from it. We'll have to change the loop condition to `while(this.getSource().hasTop())`. On line 38 of your code we'll need to call `this.getSource().next()` instead of `this.next()`. The iterator interface is documented, but there hasn't been a definitive go-to for making one. I've been drafting a blog post, but since it doesn't exist yet, hopefully the following will suffice. The lifetime of an iterator is (usually) as follows: (1) A new instance is called via Class.newInstance (so a no-args constructor is needed) (2) Init is called. This allows users to configure the iterator, set its source, and possible check the environment. We can also call `deepCopy` on the source if we want to have multiple sources (we'd do this if we wanted to do a merge read out of multiple column families within a row). (3) seek() is called. This gets our readers to the correct positions in the data that are within the scan range the user requested, as well as turning column families on or off. The name should reminiscent of seeking to some key on disk. (4) hasTop() is called. If true, that means we have data, and the iterator has a key/value pair that can be retrieved by calling getTopKey() and getTopValue(). If fasle, we're done because there's no data to return. (5) next() is called. This will attempt find a new top key and value. We go back to (4) to see if next was successful in finding a new top key/value and will repeat until the client is satisfied or hasTop() returns false. You can kind of make a state machine out of those steps where we loop between (4) and (5) until there's no data. There are more advanced workflows where next() can be reading from multiple sources, as well as seeking them to different positions in the tablet. On Mon, Jul 14, 2014 at 4:51 PM, Michael Moss michael.m...@gmail.com wrote: Thanks, William. I was just hitting you up for an example :) I adapted your pseudocode (http://pastebin.com/ufPJq0g3), but noticed that this.source in your example didn't have visibility. Did I worked around it correctly? When I add my iterator to my table and run scan from the shell, it returns nothing - what should I expect here? In general I've found the iterator interface pretty confusing and haven't spent the time wrapping my head around it yet. Any documentation or examples (beyond what I could find on the site or in the code) appreciated! *root@dev table pojo* *root@dev pojo listiter -scan -t pojo* *-* *-Iterator counter, scan scope options:* *-iteratorPriority = 10* *-iteratorClassName = iterators.Counter* *-* *root@dev pojo scan* *root@dev pojo* Best, -Mike On Mon, Jul 14, 2014 at 4:07 PM, William Slacum wilhelm.von.cl...@accumulo.net wrote: For a bit of psuedocode, I'd probably make a class that did something akin to: http://pastebin.com/pKqAeeCR I wrote that up real quick in a text editor-- it won't compile or anything, but should point you in the right direction. On Mon, Jul 14, 2014 at 3:44 PM, William Slacum wilhelm.von.cl...@accumulo.net wrote: Hi Mike! The Combiner interface is only for aggregating keys within a single row. You can probably get away with implementing your combining logic in a WrappingIterator that reads across all the rows in a given tablet. To do some combine/fold/reduce operation, Accumulo needs the input type to be the same as the output type. The combiner doesn't have a notion of a present type (as you'd see in something like Algebird's Groups), but you can use another iterator to perform your transformation. If you wanted to extract the count field from your Avro object, you could write a new Iterator that took your Avro object, extracted the desired field, and returned it as its top value. You can then set this iterator as the source of the aggregator, either programmatically or via by wrapping the source object passed to the aggregator in its SortedKeyValueIterator#init call. This is a bit inefficient as you'd have to serialize to a Value and then immediately deserialize it in the iterator above it. You could mitigate this by exposing a method that would get the extracted value before serializing it. This kind of counting also requires client side logic to do a final combine operation, since the aggregations from all the tservers are partial results.
Re: Iterating/Aggregating/Combining Complex (Java POJO/Avro) Values
A quick sanity check is to make sure you have data in the table and that you can read the data without your iterator (I've thought I had a bug because I didn't have proper visibilities more times than I'd like to admit). Alternatively, you can also enable remote-debugging via Eclipse into the TabletServer which might help you understand more of what's going on. Lots of articles on how to set this up [1]. In short, add -Xdebug -Xrunjdwp:transport=dt_socket,server=y,address=8000 to ACCUMULO_TSERVER_OPTS in accumulo-env.sh, restart the tserver, connect eclipse to 8000 via the Debug configuration menu, set a breakpoint in your init, seek and next methods, and `scan` in the shell. [1] http://javarevisited.blogspot.com/2011/02/how-to-setup-remote-debugging-in.html On 7/14/14, 5:33 PM, Michael Moss wrote: Hmm...Still doesn't return anything from the shell. http://pastebin.com/ndRhspf8 Any thoughts? What's the best way to debug these? On Mon, Jul 14, 2014 at 5:14 PM, William Slacum wilhelm.von.cl...@accumulo.net mailto:wilhelm.von.cl...@accumulo.net wrote: Ah, an artifact of me just willy nilly writing an iterator :) Any reference to `this.source` should be replaced with `this.getSource()`. In `next()`, your workaround ends up calling `this.hasTop()` as the while loop condition. It will always return false because two lines up we set `top_key` to null. We need to make sure that the source iterator has a top, because we want to read data from it. We'll have to change the loop condition to `while(this.getSource().hasTop())`. On line 38 of your code we'll need to call `this.getSource().next()` instead of `this.next()`. The iterator interface is documented, but there hasn't been a definitive go-to for making one. I've been drafting a blog post, but since it doesn't exist yet, hopefully the following will suffice. The lifetime of an iterator is (usually) as follows: (1) A new instance is called via Class.newInstance (so a no-args constructor is needed) (2) Init is called. This allows users to configure the iterator, set its source, and possible check the environment. We can also call `deepCopy` on the source if we want to have multiple sources (we'd do this if we wanted to do a merge read out of multiple column families within a row). (3) seek() is called. This gets our readers to the correct positions in the data that are within the scan range the user requested, as well as turning column families on or off. The name should reminiscent of seeking to some key on disk. (4) hasTop() is called. If true, that means we have data, and the iterator has a key/value pair that can be retrieved by calling getTopKey() and getTopValue(). If fasle, we're done because there's no data to return. (5) next() is called. This will attempt find a new top key and value. We go back to (4) to see if next was successful in finding a new top key/value and will repeat until the client is satisfied or hasTop() returns false. You can kind of make a state machine out of those steps where we loop between (4) and (5) until there's no data. There are more advanced workflows where next() can be reading from multiple sources, as well as seeking them to different positions in the tablet. On Mon, Jul 14, 2014 at 4:51 PM, Michael Moss michael.m...@gmail.com mailto:michael.m...@gmail.com wrote: Thanks, William. I was just hitting you up for an example :) I adapted your pseudocode (http://pastebin.com/ufPJq0g3), but noticed that this.source in your example didn't have visibility. Did I worked around it correctly? When I add my iterator to my table and run scan from the shell, it returns nothing - what should I expect here? In general I've found the iterator interface pretty confusing and haven't spent the time wrapping my head around it yet. Any documentation or examples (beyond what I could find on the site or in the code) appreciated! /root@dev table pojo/ /root@dev pojo listiter -scan -t pojo/ /-/ /-Iterator counter, scan scope options:/ /-iteratorPriority = 10/ /-iteratorClassName = iterators.Counter/ /-/ /root@dev pojo scan/ /root@dev pojo/ Best, -Mike On Mon, Jul 14, 2014 at 4:07 PM, William Slacum wilhelm.von.cl...@accumulo.net mailto:wilhelm.von.cl...@accumulo.net wrote: For a bit of psuedocode, I'd probably make a class that did something akin to: http://pastebin.com/pKqAeeCR I wrote that up real quick in a text editor-- it won't compile or anything, but should point you in the right direction. On Mon, Jul 14, 2014 at 3:44 PM, William Slacum
Re: Iterating/Aggregating/Combining Complex (Java POJO/Avro) Values
Anything in your Tserver log? I think you should just rethrow that IOExcepton on your source's next() method, since they're usually not recoverable (ie, just make Counter#next throw IOException) On Mon, Jul 14, 2014 at 5:48 PM, Josh Elser josh.el...@gmail.com wrote: A quick sanity check is to make sure you have data in the table and that you can read the data without your iterator (I've thought I had a bug because I didn't have proper visibilities more times than I'd like to admit). Alternatively, you can also enable remote-debugging via Eclipse into the TabletServer which might help you understand more of what's going on. Lots of articles on how to set this up [1]. In short, add -Xdebug -Xrunjdwp:transport=dt_socket,server=y,address=8000 to ACCUMULO_TSERVER_OPTS in accumulo-env.sh, restart the tserver, connect eclipse to 8000 via the Debug configuration menu, set a breakpoint in your init, seek and next methods, and `scan` in the shell. [1] http://javarevisited.blogspot.com/2011/02/how-to-setup- remote-debugging-in.html On 7/14/14, 5:33 PM, Michael Moss wrote: Hmm...Still doesn't return anything from the shell. http://pastebin.com/ndRhspf8 Any thoughts? What's the best way to debug these? On Mon, Jul 14, 2014 at 5:14 PM, William Slacum wilhelm.von.cl...@accumulo.net mailto:wilhelm.von.cl...@accumulo.net wrote: Ah, an artifact of me just willy nilly writing an iterator :) Any reference to `this.source` should be replaced with `this.getSource()`. In `next()`, your workaround ends up calling `this.hasTop()` as the while loop condition. It will always return false because two lines up we set `top_key` to null. We need to make sure that the source iterator has a top, because we want to read data from it. We'll have to change the loop condition to `while(this.getSource().hasTop())`. On line 38 of your code we'll need to call `this.getSource().next()` instead of `this.next()`. The iterator interface is documented, but there hasn't been a definitive go-to for making one. I've been drafting a blog post, but since it doesn't exist yet, hopefully the following will suffice. The lifetime of an iterator is (usually) as follows: (1) A new instance is called via Class.newInstance (so a no-args constructor is needed) (2) Init is called. This allows users to configure the iterator, set its source, and possible check the environment. We can also call `deepCopy` on the source if we want to have multiple sources (we'd do this if we wanted to do a merge read out of multiple column families within a row). (3) seek() is called. This gets our readers to the correct positions in the data that are within the scan range the user requested, as well as turning column families on or off. The name should reminiscent of seeking to some key on disk. (4) hasTop() is called. If true, that means we have data, and the iterator has a key/value pair that can be retrieved by calling getTopKey() and getTopValue(). If fasle, we're done because there's no data to return. (5) next() is called. This will attempt find a new top key and value. We go back to (4) to see if next was successful in finding a new top key/value and will repeat until the client is satisfied or hasTop() returns false. You can kind of make a state machine out of those steps where we loop between (4) and (5) until there's no data. There are more advanced workflows where next() can be reading from multiple sources, as well as seeking them to different positions in the tablet. On Mon, Jul 14, 2014 at 4:51 PM, Michael Moss michael.m...@gmail.com mailto:michael.m...@gmail.com wrote: Thanks, William. I was just hitting you up for an example :) I adapted your pseudocode (http://pastebin.com/ufPJq0g3), but noticed that this.source in your example didn't have visibility. Did I worked around it correctly? When I add my iterator to my table and run scan from the shell, it returns nothing - what should I expect here? In general I've found the iterator interface pretty confusing and haven't spent the time wrapping my head around it yet. Any documentation or examples (beyond what I could find on the site or in the code) appreciated! /root@dev table pojo/ /root@dev pojo listiter -scan -t pojo/ /-/ /-Iterator counter, scan scope options:/ /-iteratorPriority = 10/ /-iteratorClassName = iterators.Counter/ /-/ /root@dev pojo scan/ /root@dev pojo/ Best, -Mike On Mon, Jul 14, 2014 at 4:07 PM, William Slacum wilhelm.von.cl...@accumulo.net mailto:wilhelm.von.cl...@accumulo.net wrote: For a bit of psuedocode, I'd