I like the new URN as it also provides a way for us to re-use the combine
payload as part of combining state specs. This would allow runners to
execute a gRPC Read + Combine Grouped Values + gRPC Write on the contents
of a StateSpec if it grows too large.

On Wed, May 23, 2018 at 1:57 PM Daniel Oliveira <[email protected]>
wrote:

> Hi everyone,
>
> This email should be relevant to anyone interested in the portable
> pipeline model. A few months ago I sent out an email with this doc
> describing my ideas for modelling portable combines that support lifting:
> https://s.apache.org/beam-runner-api-combine-model
>
> Recently, after some offline discussion with other devs working on
> portability I'd like to add a new URN to the spec and I figured I should
> update the dev list again to get any feedback on the idea. If no one has
> any problems with the proposal I'm hoping to add this to the doc in a few
> days.
>
> *Proposal:*
> The doc currently only has one way to execute unlifted combines: Execute
> the ParDo within the CombinePerKey composite provided by the SDK.
>
> The proposal is to add a second way to execute unlifted combines: Adding a
> URN to represent an unlifted combine step executed after a GroupByKey
> transform, tentatively named "beam:transform:combine_grouped_values". Just
> like the other combine parts listed in the doc, the URN would be in a
> PTransform along with a CombineFn.
>
> *Reasoning:*
> Under the original spec the only way to execute unlifted combines is to
> execute a ParDo containing the logic of that combine. In the best case this
> is very straightforward: The Runner receives a CombinePerKey and sends it
> to the SDK Harness for execution without changing anything and the ParDo
> will execute the full combine.
>
> However, this causes issues when sending the provided ParDo for execution
> isn't straightforward. Situations may come up, due to runner implementation
> details, where a CombineGroupedValues needs to be executed and the ParDo
> associated with it is not easily retrieved or doesn't exist. This new URN
> provides a backup option so that a full combine can be executed even then.
>
> Thank you,
> Daniel Oliveira
>

Reply via email to