I like the new URN as it also provides a way for us to re-use the combine payload as part of combining state specs. This would allow runners to execute a gRPC Read + Combine Grouped Values + gRPC Write on the contents of a StateSpec if it grows too large.
On Wed, May 23, 2018 at 1:57 PM Daniel Oliveira <[email protected]> wrote: > Hi everyone, > > This email should be relevant to anyone interested in the portable > pipeline model. A few months ago I sent out an email with this doc > describing my ideas for modelling portable combines that support lifting: > https://s.apache.org/beam-runner-api-combine-model > > Recently, after some offline discussion with other devs working on > portability I'd like to add a new URN to the spec and I figured I should > update the dev list again to get any feedback on the idea. If no one has > any problems with the proposal I'm hoping to add this to the doc in a few > days. > > *Proposal:* > The doc currently only has one way to execute unlifted combines: Execute > the ParDo within the CombinePerKey composite provided by the SDK. > > The proposal is to add a second way to execute unlifted combines: Adding a > URN to represent an unlifted combine step executed after a GroupByKey > transform, tentatively named "beam:transform:combine_grouped_values". Just > like the other combine parts listed in the doc, the URN would be in a > PTransform along with a CombineFn. > > *Reasoning:* > Under the original spec the only way to execute unlifted combines is to > execute a ParDo containing the logic of that combine. In the best case this > is very straightforward: The Runner receives a CombinePerKey and sends it > to the SDK Harness for execution without changing anything and the ParDo > will execute the full combine. > > However, this causes issues when sending the provided ParDo for execution > isn't straightforward. Situations may come up, due to runner implementation > details, where a CombineGroupedValues needs to be executed and the ParDo > associated with it is not easily retrieved or doesn't exist. This new URN > provides a backup option so that a full combine can be executed even then. > > Thank you, > Daniel Oliveira >
