On Tue, Apr 18, 2017 at 3:03 AM, Wesley Tanaka <wtan...@yahoo.com.invalid> wrote: > I believe that foldl in Haskell https://www.haskell.org/hoogle/?hoogle=foldl > admits a separate accumulator type from the type of the data structure being > "folded" > And, well, python lets you have your way with mixing types, but this > certainly works as another example:python -c "print(reduce(lambda ac, elem: > '%s%d' % (ac,elem), [1,2,3,4,5], ''))" > Is there anything special about the AccumT->OutputT conversion that > extractOutput() needs to be in the same interface as createAccumulator(), > addInput() and mergeAccumulators()? If the interface were segregated such > that one interface managed the InputT->AccumT conversion, and the second > managed the AccumT->InputT conversion, it seems like maybe the > AccumT->OutputT conversion could even get replaced with MapElements? And > then the full current "Combine" functionality could be implemented as a > composition of the lower-level primitives?
It is correct that the AccumT->OutputT conversion could be implemented as a subsequent MapElements operation. One reason that it's not is that in practice it's tightly coupled with the other parts of the Combine as a single semantic unit (e.g. one things of taking the "mean" as a single operation). Once one moves beyond the simple combiners (with identity extractOutput) there's often a 1:1 correspondence between the CombineFn and its output extraction that reduces potential value of splitting them while placing further burden on their user (e.g. there's little use for the Quantiles intermediate accumulator without the mapping from that actual quantiles, and vice versa). Also, letting the extractOutput be part of the CombineFn itself rather than simply providing a separate Map allows one to use the CombineFn uniformly for global combine, per-key combine, and combined state. It also makes it more compossible (e.g. the TupleCombineFns at https://github.com/apache/beam/blob/release-0.6.0/sdks/python/apache_beam/transforms/combiners.py#L443 that create a single CombineFn (including the output extraction) from a set of CombineFns applying each in parallel. - Robert