Re: Parallelism in Combine.GroupedValues

Luke Cwik Tue, 12 May 2020 08:06:01 -0700

There is more than one instance of an accumulator being used and then those
accumulators are merged using mergeAccumulators method.


Two examples of when combining happens in parallel is when the withFewKeys
hint is used on the combiner or when there is partial combining[1]
happening on the mapper side before the grouping operation.

1: https://s.apache.org/beam-runner-api-combine-model

On Tue, May 12, 2020 at 7:05 AM rahul patwari <[email protected]>
wrote:

> Hi,
>
> In the Javadoc for Combine.GroupedValues[1], it has been described that 
> *combining
> the values associated with a single key can happen in parallel*.
> The logic to combine values associated with a key can be provided by
> CombineFnWithContext (or) CombineFn.
> Both CombineFnWithContext.apply()[2] and CombineFn.apply()[3] uses a
> single accumulator to combine the values.
>
> My understanding is that the parallelism in Combine PTransform will be
> determined by the no. of accumulators. But, the Javadoc describes that
> combining is done in parallel even though the no. of accumulators used to
> combine is one.
>
> How can combine happen parallelly by using only one accumulator?
>
> Regards,
> Rahul
>
> [1]:
> https://github.com/apache/beam/blob/334682d4a8ac5e1ebd298ba3b8020a9161884927/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Combine.java#L2075-L2078
> [2]:
> https://github.com/apache/beam/blob/53e5cee254023152e77a3fc46564642dc9b6b506/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/CombineWithContext.java#L117
> [3]:
> https://github.com/apache/beam/blob/334682d4a8ac5e1ebd298ba3b8020a9161884927/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Combine.java#L443
>

Re: Parallelism in Combine.GroupedValues

Reply via email to