Hi, All.

I'm curious what the best practices are around persisting complex
types/data in Accumulo (and aggregating on fields within them).

Let's say I have (row, column family, column qualifier, value):
"A" "foo" "" MyHugeAvroObject(count=2)
"A" "foo" "" MyHugeAvroObject(count=3)

Let's say MyHugeAvroObject has a field "Integer count" with the values
above.

What is the best way to aggregate on row, column family, column qualifier
by count? In my above example:
"A" "foo" "" 5

The TypedValueCombiner.typedReduce method can deserialize any "V", in my
case MyHugeAvroObject, but it needs to return a value of type "V". What are
the best practices for deeply nested/complex objects? It's not always
straightforward to map a complex Avro type into Row -> Column Family ->
Column Qualifier.

Rather than using a TypedCombiner, I looked into using an Aggregator (which
appears deprecated as of 1.4), which appears to let me return arbitrary
values, but despite running setiter, my aggregator doesn't seem to do
anything.

I also tried looking at implementing a WrappingIterator, which also appears
to allow me to return arbitary values (such as Accumulo's
CountingIterator), but I get cryptic errors when trying to setiter, I'm on
Accumulo 1.6:

root@dev kyt> setiter -t kyt -scan -p 10 -n countingIter -class
org.apache.accumulo.core.iterators.system.CountingIterator
2014-07-14 11:12:55,623 [shell.Shell] ERROR:
java.lang.IllegalArgumentException:
org.apache.accumulo.core.iterators.system.CountingIterator

This is odd because other included implementations of WrappingIterator seem
to work (perhaps the implementation of CountingIterator is dated):
root@dev kyt> setiter -t kyt -scan -p 10 -n deletingIterator -class
org.apache.accumulo.core.iterators.system.DeletingIterator
The iterator class does not implement OptionDescriber. Consider this for
better iterator configuration using this setiter command.
Name for iterator (enter to skip):

All in all, how can I aggregate simple values, like counters from rows with
complex Avro objects as Values without having to add aggregations fields to
these Value objects?

Thanks!

-Mike

Reply via email to