It really isn't documented anywhere. There is a small section in my book in ch08 about it. It didn't make the alpha that is up of ch08 though.
On Thu, Apr 23, 2009 at 1:44 PM, Dan Milstein <dmilst...@hubteam.com> wrote: > Hello all, > > I've been using streaming + the aggregate package (available via -reducer > aggregate), and have been very happy with what it gives me. > > I'm interested in writing my own new aggregate functions (in Java) which I > could then access from my streaming code. > > Can anyone give me pointers towards how to make that happen? I've read > through the aggregate package source, but I'm not seeing how to define my > own, and get access to it from streaming. > > To be specific, here's the sort of thing I'd like to be able to do: > > - In Java, define a SampleValues aggregator, which chooses a sample of the > input given to it > > - From my streaming program, in say python, output: > > SampleValues:some_key \t some_value > > - Have the aggregate framework somehow call my new aggregator for the > combiner and reducer steps > > Thanks, > -Dan Milstein > -- Alpha Chapters of my book on Hadoop are available http://www.apress.com/book/view/9781430219422