That worked ;) - Thanks!
What a journey...
I like Accumulo's architecture and promise, but the difficulty in
querying it (lack of documentation, conventions) is a major concern and
I'd imagine has to have an impact on adoption. I'm curious if there have
been any conversations around changing the interface around iterators
which are still confusing to me. Let me know how I can help!
On Tue, Jul 15, 2014 at 12:03 PM, William Slacum
<wilhelm.von.cl...@accumulo.net <mailto:wilhelm.von.cl...@accumulo.net>>
wrote:
Herp... serves me right for not setting up a proper test case.
I think you need to override seek as well:
@Override
public void seek(...) throws IOException {
super.seek(...);
next();
}
I think I just realized the wrapping iterator could use some clean
up, because this isn't obvious. Basically after the wrapping
iterator's seek is called, it never calls the implementor's next()
to actually set up the first top key and value.
On Tue, Jul 15, 2014 at 9:50 AM, Michael Moss
<michael.m...@gmail.com <mailto:michael.m...@gmail.com>> wrote:
I set up debugging and am rethrowing the exception. What's
strange is it appears that despite the iterator instance being
properly set to iterator.Counter (my implementation), my
breakpoints aren't being hit, only in the parent classes
(Wrapping Iterator) and (SortedKeyValueIterator).
I have two rows in the table, when I scan with no iterator:
2014-07-15 06:46:26,577 [Audit ] INFO : operation: permitted;
user: root; action: scan; targetTable: pojo; authorizations:
public,; range: (-inf,+inf); columns: []; iterators: [];
iteratorOptions: {};
2014-07-15 06:46:26,589 [tserver.TabletServer] DEBUG: ScanSess
tid 10.0.2.15:45073 <http://10.0.2.15:45073> 8*2 entries* in
0.01 secs, nbTimes = [7 7 7.00 1]
When I scan with the iterator (0 entries?):
2014-07-15 06:45:58,036 [Audit ] INFO : operation: permitted;
user: root; action: scan; targetTable: pojo; authorizations:
public,; range: (-inf,+inf); columns: []; iterators: [];
iteratorOptions: {};
2014-07-15 06:45:58,047 [tserver.TabletServer] DEBUG: ScanSess
tid 10.0.2.15:44992 <http://10.0.2.15:44992> 8 *0 entries* in
0.01 secs, nbTimes = [6 6 6.00 1]
No exceptions otherwise. Really appreciate all the ongoing help.
Best,
-Mike
On Mon, Jul 14, 2014 at 6:40 PM, William Slacum
<wilhelm.von.cl...@accumulo.net
<mailto:wilhelm.von.cl...@accumulo.net>> wrote:
Anything in your Tserver log? I think you should just
rethrow that IOExcepton on your source's next() method,
since they're usually not recoverable (ie, just make
Counter#next throw IOException)
On Mon, Jul 14, 2014 at 5:48 PM, Josh Elser
<josh.el...@gmail.com <mailto:josh.el...@gmail.com>> wrote:
A quick sanity check is to make sure you have data in
the table and that you can read the data without your
iterator (I've thought I had a bug because I didn't have
proper visibilities more times than I'd like to admit).
Alternatively, you can also enable remote-debugging via
Eclipse into the TabletServer which might help you
understand more of what's going on.
Lots of articles on how to set this up [1]. In short,
add -Xdebug
-Xrunjdwp:transport=dt_socket,__server=y,address=8000 to
ACCUMULO_TSERVER_OPTS in accumulo-env.sh, restart the
tserver, connect eclipse to 8000 via the Debug
configuration menu, set a breakpoint in your init, seek
and next methods, and `scan` in the shell.
[1]
http://javarevisited.blogspot.__com/2011/02/how-to-setup-__remote-debugging-in.html
<http://javarevisited.blogspot.com/2011/02/how-to-setup-remote-debugging-in.html>
On 7/14/14, 5:33 PM, Michael Moss wrote:
Hmm...Still doesn't return anything from the shell.
http://pastebin.com/ndRhspf8
Any thoughts? What's the best way to debug these?
On Mon, Jul 14, 2014 at 5:14 PM, William Slacum
<wilhelm.von.cloud@accumulo.__net
<mailto:wilhelm.von.cl...@accumulo.net>
<mailto:wilhelm.von.cloud@__accumulo.net
<mailto:wilhelm.von.cl...@accumulo.net>>>
wrote:
Ah, an artifact of me just willy nilly writing
an iterator :) Any
reference to `this.source` should be replaced with
`this.getSource()`. In `next()`, your
workaround ends up calling
`this.hasTop()` as the while loop condition. It
will always return
false because two lines up we set `top_key` to
null. We need to make
sure that the source iterator has a top,
because we want to read
data from it. We'll have to change the loop
condition to
`while(this.getSource().__hasTop())`. On line
38 of your code we'll
need to call `this.getSource().next()` instead
of `this.next()`.
The iterator interface is documented, but there
hasn't been a
definitive go-to for making one. I've been
drafting a blog post, but
since it doesn't exist yet, hopefully the
following will suffice.
The lifetime of an iterator is (usually) as
follows:
(1) A new instance is called via
Class.newInstance (so a no-args
constructor is needed)
(2) Init is called. This allows users to
configure the iterator, set
its source, and possible check the environment.
We can also call
`deepCopy` on the source if we want to have
multiple sources (we'd
do this if we wanted to do a merge read out of
multiple column
families within a row).
(3) seek() is called. This gets our readers to
the correct positions
in the data that are within the scan range the
user requested, as
well as turning column families on or off. The
name should
reminiscent of seeking to some key on disk.
(4) hasTop() is called. If true, that means we
have data, and the
iterator has a key/value pair that can be
retrieved by calling
getTopKey() and getTopValue(). If fasle, we're
done because there's
no data to return.
(5) next() is called. This will attempt find a
new top key and
value. We go back to (4) to see if next was
successful in finding a
new top key/value and will repeat until the
client is satisfied or
hasTop() returns false.
You can kind of make a state machine out of
those steps where we
loop between (4) and (5) until there's no data.
There are more
advanced workflows where next() can be reading
from multiple
sources, as well as seeking them to different
positions in the tablet.
On Mon, Jul 14, 2014 at 4:51 PM, Michael Moss
<michael.m...@gmail.com
<mailto:michael.m...@gmail.com>
<mailto:michael.m...@gmail.com
<mailto:michael.m...@gmail.com>__>> wrote:
Thanks, William. I was just hitting you up
for an example :)
I adapted your pseudocode
(http://pastebin.com/ufPJq0g3)__, but
noticed that "this.source" in your example
didn't have
visibility. Did I worked around it correctly?
When I add my iterator to my table and run
scan from the shell,
it returns nothing - what should I expect
here? In general I've
found the iterator interface pretty
confusing and haven't spent
the time wrapping my head around it yet.
Any documentation or
examples (beyond what I could find on the
site or in the code)
appreciated!
/root@dev> table pojo/
/root@dev pojo> listiter -scan -t pojo/
/-/
/- Iterator counter, scan scope options:/
/- iteratorPriority = 10/
/- iteratorClassName =
iterators.Counter/
/-/
/root@dev pojo> scan/
/root@dev pojo>/
Best,
-Mike
On Mon, Jul 14, 2014 at 4:07 PM, William Slacum
<wilhelm.von.cloud@accumulo.__net
<mailto:wilhelm.von.cl...@accumulo.net>
<mailto:wilhelm.von.cloud@__accumulo.net
<mailto:wilhelm.von.cl...@accumulo.net>>> wrote:
For a bit of psuedocode, I'd probably
make a class that did
something akin to:
http://pastebin.com/pKqAeeCR
I wrote that up real quick in a text
editor-- it won't
compile or anything, but should point
you in the right
direction.
On Mon, Jul 14, 2014 at 3:44 PM,
William Slacum
<wilhelm.von.cloud@accumulo.__net
<mailto:wilhelm.von.cl...@accumulo.net>
<mailto:wilhelm.von.cloud@__accumulo.net
<mailto:wilhelm.von.cl...@accumulo.net>>> wrote:
Hi Mike!
The Combiner interface is only for
aggregating keys
within a single row. You can
probably get away with
implementing your combining logic
in a WrappingIterator
that reads across all the rows in a
given tablet.
To do some combine/fold/reduce
operation, Accumulo needs
the input type to be the same as
the output type. The
combiner doesn't have a notion of a
"present" type (as
you'd see in something like
Algebird's Groups), but you
can use another iterator to perform
your transformation.
If you wanted to extract the
"count" field from your
Avro object, you could write a new
Iterator that took
your Avro object, extracted the
desired field, and
returned it as its top value. You
can then set this
iterator as the source of the
aggregator, either
programmatically or via by wrapping
the source object
passed to the aggregator in its
SortedKeyValueIterator#init call.
This is a bit inefficient as you'd
have to serialize to
a Value and then immediately
deserialize it in the
iterator above it. You could
mitigate this by exposing a
method that would get the extracted
value before
serializing it.
This kind of counting also requires
client side logic to
do a final combine operation, since
the aggregations
from all the tservers are partial
results.
I believe that CountingIterator is
not meant for user
consumption, but I do not know if
it's related to your
issue in trying to use it from the
shell. Iterators set
through the shell, in previous
versions of Accumulo,
have a requirement to implement
OptionDescriber. Many
default iterators do not implement
this, and thus can't
set in the shell.
On Mon, Jul 14, 2014 at 2:44 PM,
Michael Moss
<michael.m...@gmail.com
<mailto:michael.m...@gmail.com>
<mailto:michael.m...@gmail.com
<mailto:michael.m...@gmail.com>__>>
wrote:
Hi, All.
I'm curious what the best
practices are around
persisting complex types/data
in Accumulo (and
aggregating on fields within them).
Let's say I have (row, column
family, column
qualifier, value):
"A" "foo" ""
MyHugeAvroObject(count=2)
"A" "foo" ""
MyHugeAvroObject(count=3)
Let's say MyHugeAvroObject has
a field "Integer
count" with the values above.
What is the best way to
aggregate on row, column
family, column qualifier by
count? In my above example:
"A" "foo" "" 5
The
TypedValueCombiner.typedReduce method can
deserialize any "V", in my case
MyHugeAvroObject,
but it needs to return a value
of type "V". What are
the best practices for deeply
nested/complex
objects? It's not always
straightforward to map a
complex Avro type into Row ->
Column Family ->
Column Qualifier.
Rather than using a
TypedCombiner, I looked into
using an Aggregator (which
appears deprecated as of
1.4), which appears to let me
return arbitrary
values, but despite running
setiter, my aggregator
doesn't seem to do anything.
I also tried looking at
implementing a
WrappingIterator, which also
appears to allow me to
return arbitary values (such as
Accumulo's
CountingIterator), but I get
cryptic errors when
trying to setiter, I'm on
Accumulo 1.6:
root@dev kyt> setiter -t kyt
-scan -p 10 -n
countingIter -class
org.apache.accumulo.core.__iterators.system.__CountingIterator
2014-07-14 11:12:55,623
[shell.Shell] ERROR:
java.lang.__IllegalArgumentException:
org.apache.accumulo.core.__iterators.system.__CountingIterator
This is odd because other
included implementations
of WrappingIterator seem to
work (perhaps the
implementation of
CountingIterator is dated):
root@dev kyt> setiter -t kyt
-scan -p 10 -n
deletingIterator -class
org.apache.accumulo.core.__iterators.system.__DeletingIterator
The iterator class does not
implement
OptionDescriber. Consider this
for better iterator
configuration using this
setiter command.
Name for iterator (enter to skip):
All in all, how can I aggregate
simple values, like
counters from rows with complex
Avro objects as
Values without having to add
aggregations fields to
these Value objects?
Thanks!
-Mike