Hi everyone,

This email should be relevant to anyone interested in the portable pipeline
model. A few months ago I sent out an email with this doc describing my
ideas for modelling portable combines that support lifting:
https://s.apache.org/beam-runner-api-combine-model

Recently, after some offline discussion with other devs working on
portability I'd like to add a new URN to the spec and I figured I should
update the dev list again to get any feedback on the idea. If no one has
any problems with the proposal I'm hoping to add this to the doc in a few
days.

*Proposal:*
The doc currently only has one way to execute unlifted combines: Execute
the ParDo within the CombinePerKey composite provided by the SDK.

The proposal is to add a second way to execute unlifted combines: Adding a
URN to represent an unlifted combine step executed after a GroupByKey
transform, tentatively named "beam:transform:combine_grouped_values". Just
like the other combine parts listed in the doc, the URN would be in a
PTransform along with a CombineFn.

*Reasoning:*
Under the original spec the only way to execute unlifted combines is to
execute a ParDo containing the logic of that combine. In the best case this
is very straightforward: The Runner receives a CombinePerKey and sends it
to the SDK Harness for execution without changing anything and the ParDo
will execute the full combine.

However, this causes issues when sending the provided ParDo for execution
isn't straightforward. Situations may come up, due to runner implementation
details, where a CombineGroupedValues needs to be executed and the ParDo
associated with it is not easily retrieved or doesn't exist. This new URN
provides a backup option so that a full combine can be executed even then.

Thank you,
Daniel Oliveira

Reply via email to