Hello all,
I've been using streaming + the aggregate package (available via -
reducer aggregate), and have been very happy with what it gives me.
I'm interested in writing my own new aggregate functions (in Java)
which I could then access from my streaming code.
Can anyone give me pointers towards how to make that happen? I've
read through the aggregate package source, but I'm not seeing how to
define my own, and get access to it from streaming.
To be specific, here's the sort of thing I'd like to be able to do:
- In Java, define a SampleValues aggregator, which chooses a sample
of the input given to it
- From my streaming program, in say python, output:
SampleValues:some_key \t some_value
- Have the aggregate framework somehow call my new aggregator for
the combiner and reducer steps
Thanks,
-Dan Milstein