Re: Naming of Combine.Globally

Aljoscha Krettek Tue, 18 Apr 2017 03:16:47 -0700

The definition of foldl in Haskell is the same as the description I gave 
earlier:


foldl :: (a -> b -> a) -> a -> [b] -> a

The function (a -> b -> a) is what I described as (T, A) -> A and it’s used to 
fold a list of b’s into an a (the accumulator type).

You’re right that the mapping AccumT->OutputT is not important and could be 
delegated to a separate method. The important part of the interface is 
mergeAccumulators() since this makes the operation distributive: we can “fold” 
a bunch of Ts into As in parallel (even on different machines) and then merge 
them together. This is what is missing from a functional fold.

Best,
Aljoscha


> On 18. Apr 2017, at 12:03, Wesley Tanaka <wtan...@yahoo.com.INVALID> wrote:
> 
> I believe that foldl in Haskell https://www.haskell.org/hoogle/?hoogle=foldl 
> admits a separate accumulator type from the type of the data structure being 
> "folded"
> And, well, python lets you have your way with mixing types, but this 
> certainly works as another example:python -c "print(reduce(lambda ac, elem: 
> '%s%d' % (ac,elem), [1,2,3,4,5], ''))"
> Is there anything special about the AccumT->OutputT conversion that 
> extractOutput() needs to be in the same interface as createAccumulator(), 
> addInput() and mergeAccumulators()?  If the interface were segregated such 
> that one interface managed the InputT->AccumT conversion, and the second 
> managed the AccumT->InputT conversion, it seems like maybe the 
> AccumT->OutputT conversion could even get replaced with MapElements?  And 
> then the full current "Combine" functionality could be implemented as a 
> composition of the lower-level primitives?
> I haven't dug that deeply into Combine yet, so I may be missing something 
> obvious.
> ---
> Wesley Tanaka
> https://wtanaka.com/
> 
> On Monday, April 17, 2017, 11:32:29 PM HST, Aljoscha Krettek 
> <aljos...@apache.org> wrote:Hi,
> I think both fold and reduce fail to capture all the power or (what we call) 
> combine. Reduce requires a function of type (T, T) -> T. It requires that the 
> output type be the same as the input type. Fold takes a function (T, A) -> A 
> where T is the input type and A is the accumulation type. Here, the output 
> type can be different from the input type. However, there is no way of 
> combining these aggregators so the operation is not distributive, i.e. we 
> cannot hierarchically apply the operation.
> 
> Combine is the generalisation of this: We have three types, T (input), A 
> (accumulator), O (output) and we require a function that can merge 
> accumulators. The operation is distributive, meaning we can efficiently 
> execute it and we can also have an output type that is different from the 
> input type.
> 
> Quick FYI: in Flink the CombineFn is called AggregatingFunction and 
> CombiningState is AggregatingState.
> 
> Best,
> Aljoscha
>> On 18. Apr 2017, at 04:29, Wesley Tanaka <wtan...@yahoo.com.INVALID> wrote:
>> 
>> As I start to understand Combine.Globally, it seems that it is, in spirit, 
>> Beam's implementation of the "fold" higher-order function
>> https://en.wikipedia.org/wiki/Fold_(higher-order_function)#Folds_in_various_languages
>> 
>> Was there a reason the word "combine" was picked instead of either "fold" or 
>> "reduce"?  From the wikipedia list above, it seems as though "fold" and 
>> "reduce" are in much more common usage, so either of those might be easier 
>> for newcomers to understand.
>> ---
>> Wesley Tanaka
>> http://wtanaka.com/

Re: Naming of Combine.Globally

Reply via email to