[
https://issues.apache.org/jira/browse/CRUNCH-286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13805695#comment-13805695
]
Gabriel Reid commented on CRUNCH-286:
-------------------------------------
I was just thinking about this one again, and started coming back around to the
idea that Josh had about making it possible for a DoFn to see in which context
it's running. What I'm thinking is that we could introduce something like a
multi-phase CombineFn implementation, something like this that would
automatically select the underlying CombineFn to run based on the context it's
running in:
{code}
MultiPhaseCombineFn<K,V>(CombineFn<K,V> mapPhaseCombineFn, CombineFn<K,V>
reducePhaseCombineFn)
{code}
This would give us the same functionality as the approach here, but wouldn't
require changing the the interface of PGroupedTable. It would also avoid adding
more direct links to MapReduce in the PCollection API (not something I'm that
worried about, but still maybe worth considering). I'm definitely ok with this
approach too, but just wanted to put the other approach out there to see if
anyone has any other thoughts on it.
> ability to specify a different function for combiner & reducer
> --------------------------------------------------------------
>
> Key: CRUNCH-286
> URL: https://issues.apache.org/jira/browse/CRUNCH-286
> Project: Crunch
> Issue Type: New Feature
> Components: Core
> Reporter: Stefan De Smit
> Assignee: Josh Wills
> Priority: Minor
> Attachments:
> 0001-add-combineValues-method-with-2-function-arguments.patch, 0002-.patch
>
>
> Extend PGroupedTable with an extra combineValues function that accepts 2
> functions: 1 for combiner phase, 1 for reducer phase.
> This way, different algorithm can be applied.
--
This message was sent by Atlassian JIRA
(v6.1#6144)