[ https://issues.apache.org/jira/browse/PIG-2651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13272968#comment-13272968 ]
Jonathan Coveney commented on PIG-2651: --------------------------------------- Please find attached a patch with tests. Note: in the process of adding the tests, I ran into this: https://issues.apache.org/jira/browse/PIG-2694 It's not blocking, but something to consider... Also: this patch includes the contents of https://issues.apache.org/jira/browse/PIG-2066. The new files are: .../apache/pig/IteratingAccumulatorEvalFunc.java .../udf/evalfunc/IteratingAccumulatorCount.java .../udf/evalfunc/IteratingAccumulatorIsEmpty.java .../test/udf/evalfunc/IteratingAccumulatorSum.java And of course, new e2e tests in nightly.conf > Provide a much easier to use accumulator interface > -------------------------------------------------- > > Key: PIG-2651 > URL: https://issues.apache.org/jira/browse/PIG-2651 > Project: Pig > Issue Type: New Feature > Reporter: Jonathan Coveney > Assignee: Jonathan Coveney > Fix For: 0.11, 0.10.1 > > Attachments: PIG-2651-0.patch, PIG-2651-1.patch > > > This introduces a new interface, IteratingAccumulatorEvalFunc (that name is > NOT final...). The cool thing about this patch is that it is built purely on > top of the existing Accumulator code (well, it uses PIG-2066, but it could > easily work without it). That is to say, it's an easier way to write > accumulators without having to fork the Pig codebase. > The downside is that the only way I am able to provide such a clean interface > is by using a second thread. I need to explore any potential performance > implications, but given that most of the easy to use Pig stuff has > performance implications, I think as long as we measure and and document > them, it's worth the much more usable interface. Plus I don't think it will > be too bad as one thread does the heavy lifting, while another just ferries > values in between. SUM could now be written as: > {code} > public class SUM extends IteratingAccumulatorEvalFunc<Long> { > public Long exec(Iterator<Tuple> it) throws IOException { > long sum = 0; > while (it.hasNext()) { > sum += (Long)it.next().get(0); > } > return sum; > } > } > {code} > Besides performance tests, I need to figure out how to properly test this > sort of thing. I particularly welcome advice on that front. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira