On Tue, Apr 18, 2017 at 3:03 AM, Wesley Tanaka
<wtan...@yahoo.com.invalid> wrote:
> I believe that foldl in Haskell https://www.haskell.org/hoogle/?hoogle=foldl 
> admits a separate accumulator type from the type of the data structure being 
> "folded"
> And, well, python lets you have your way with mixing types, but this 
> certainly works as another example:python -c "print(reduce(lambda ac, elem: 
> '%s%d' % (ac,elem), [1,2,3,4,5], ''))"
> Is there anything special about the AccumT->OutputT conversion that 
> extractOutput() needs to be in the same interface as createAccumulator(), 
> addInput() and mergeAccumulators()?  If the interface were segregated such 
> that one interface managed the InputT->AccumT conversion, and the second 
> managed the AccumT->InputT conversion, it seems like maybe the 
> AccumT->OutputT conversion could even get replaced with MapElements?  And 
> then the full current "Combine" functionality could be implemented as a 
> composition of the lower-level primitives?

It is correct that the AccumT->OutputT conversion could be implemented
as a subsequent MapElements operation. One reason that it's not is
that in practice it's tightly coupled with the other parts of the
Combine as a single semantic unit (e.g. one things of taking the
"mean" as a single operation). Once one moves beyond the simple
combiners (with identity extractOutput) there's often a 1:1
correspondence between the CombineFn and its output extraction that
reduces potential value of splitting them while placing further burden
on their user (e.g. there's little use for the Quantiles intermediate
accumulator without the mapping from that actual quantiles, and vice
versa).

Also, letting the extractOutput be part of the CombineFn itself rather
than simply providing a separate Map allows one to use the CombineFn
uniformly for global combine, per-key combine, and combined state. It
also makes it more compossible (e.g. the TupleCombineFns at
https://github.com/apache/beam/blob/release-0.6.0/sdks/python/apache_beam/transforms/combiners.py#L443
that create a single CombineFn (including the output extraction) from
a set of CombineFns applying each in parallel.

- Robert

Reply via email to