On Wed, Feb 14, 2024 at 10:28 AM Kenneth Knowles <k...@apache.org> wrote: > > Hi all, > > TL;DR I want to add some API like PTransform.getURN, toProto and fromProto, > etc. to the Java SDK. I want to do this so that making a PTransform support > portability is a natural part of writing the transform and not a totally > separate thing with tons of boilerplate. > > What do you think?
Huge +1 to this direction. IMHO one of the most fundamental things about Beam is its model. Originally this was only expressed in a specific SDK (Java) and then got ported to others, but now that we have portability it's expressed in a language-independent way. The fact that we keep these separate in Java is not buying us anything, and causes a huge amount of boilerplate that'd be great to remove, as well as making the essential model more front-and-center. > I think a particular API can be sorted out most easily in code (which I will > prepare after gathering some feedback). > > We already have all the translation logic written, and porting a couple > transforms to it will ensure the API has everything we need. We can refer to > Python and Go for API ideas as well. > > Lots of context below, but you can skip it... > > ----- > > When we first created the portability framework, we wanted the SDKs to be > "standalone" and not depend on portability. We wanted portability to be an > optional plugin that users could opt in to. That is totally the opposite now. > We want portability to be the main place where Beam is defined, and then SDKs > make that available in language idiomatic ways. > > Also when we first created the framework, we were experimenting with > different serialization approaches and we wanted to be independent of > protobuf and gRPC if we could. But now we are pretty committed and it would > be a huge lift to use anything else. > > Finally, at the time we created the portability framework, we designed it to > allow composites to have URNs and well-defined specs, rather than just be > language-specific subgraphs, but we didn't really plan to make this easy. > > For all of the above, most users depend on portability and on proto. So > separating them is not useful and just creates LOTS of boilerplate and > friction for making new well-defined transforms. > > Kenn