On Wed, Feb 14, 2024 at 10:28 AM Kenneth Knowles <k...@apache.org> wrote:
> Hi all, > > TL;DR I want to add some API like PTransform.getURN, toProto and > fromProto, etc. to the Java SDK. I want to do this so that making a > PTransform support portability is a natural part of writing the transform > and not a totally separate thing with tons of boilerplate. > > What do you think?'' > +1. Currently users have to look at two different places when it comes to defining the transform and when it comes to defining the portabile representation of the transform (urn, toProto etc.). It's much easier to move these to a single interface given that we are fully committed to portability. > > I think a particular API can be sorted out most easily in code (which I > will prepare after gathering some feedback). > I think we basically want to move the API defined in the TransformPayloadTranslator (or something similar to that) to the PTransform class. https://github.com/apache/beam/blob/bfa26a4d907d844aed4b938f88142ed0fc82c90f/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/PTransformTranslation.java#L597 Python SDK already has toRunnerAPI/fromRunnerAPI interface methods defined in the PTransform class. https://github.com/apache/beam/blob/bfa26a4d907d844aed4b938f88142ed0fc82c90f/sdks/python/apache_beam/transforms/ptransform.py#L747 I would also like to call out the newly added PTransform constructor toConfigRow/fromConfigRow interface methods which I think should also move to the PTransform class. https://github.com/apache/beam/blob/bfa26a4d907d844aed4b938f88142ed0fc82c90f/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/PTransformTranslation.java#L634 Thanks, Cham > We already have all the translation logic written, and porting a couple > transforms to it will ensure the API has everything we need. We can refer > to Python and Go for API ideas as well. > > Lots of context below, but you can skip it... > > ----- > > When we first created the portability framework, we wanted the SDKs to be > "standalone" and not depend on portability. We wanted portability to be an > optional plugin that users could opt in to. That is totally the opposite > now. We want portability to be the main place where Beam is defined, and > then SDKs make that available in language idiomatic ways. > > Also when we first created the framework, we were experimenting with > different serialization approaches and we wanted to be independent of > protobuf and gRPC if we could. But now we are pretty committed and it would > be a huge lift to use anything else. > > Finally, at the time we created the portability framework, we designed it > to allow composites to have URNs and well-defined specs, rather than just > be language-specific subgraphs, but we didn't really plan to make this easy. > > For all of the above, most users depend on portability and on proto. So > separating them is not useful and just creates LOTS of boilerplate and > friction for making new well-defined transforms. > > Kenn >