[ https://issues.apache.org/jira/browse/PIG-2651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13256065#comment-13256065 ]
Alan Gates commented on PIG-2651: --------------------------------- In general looks good. This will be great for performance in certain situations. A few issues: * The new files need Apache License headers. * The new files need javadocs. * The new files need stability and audience annotations. * We need tests. In particular it should test that this works stand alone, that multiple work together, and that it works with other non-TerminatingAccumulator accumulators. > Provide a much easier to use accumulator interface > -------------------------------------------------- > > Key: PIG-2651 > URL: https://issues.apache.org/jira/browse/PIG-2651 > Project: Pig > Issue Type: New Feature > Reporter: Jonathan Coveney > Assignee: Jonathan Coveney > Fix For: 0.11, 0.10.1 > > Attachments: PIG-2651-0.patch > > > This introduces a new interface, IteratingAccumulatorEvalFunc (that name is > NOT final...). The cool thing about this patch is that it is built purely on > top of the existing Accumulator code (well, it uses PIG-2066, but it could > easily work without it). That is to say, it's an easier way to write > accumulators without having to fork the Pig codebase. > The downside is that the only way I am able to provide such a clean interface > is by using a second thread. I need to explore any potential performance > implications, but given that most of the easy to use Pig stuff has > performance implications, I think as long as we measure and and document > them, it's worth the much more usable interface. Plus I don't think it will > be too bad as one thread does the heavy lifting, while another just ferries > values in between. SUM could now be written as: > {code} > public class SUM extends IteratingAccumulatorEvalFunc<Long> { > public Long exec(Iterator<Tuple> it) throws IOException { > long sum = 0; > while (it.hasNext()) { > sum += (Long)it.next().get(0); > } > return sum; > } > } > {code} > Besides performance tests, I need to figure out how to properly test this > sort of thing. I particularly welcome advice on that front. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira