+1 While the current Go SDK has always been portability first it was designed with a goal of enabling it to back out of that at the time, so it's fully on a broad vertical slice of things to translate to protos and back again, leading to difficulties when adding a new core transform.
I have an experimental hobby implementation of a Go SDK for prototyping things (mostly seeing if Go Generics can make a pipeline compile time typesafe, and the answer is yes... but that's a different email) and went with emitting out a FunctionSpec, (urn and payload), the env ID, and UniqueName, while inputs and outputs were handled with common code. I still kept Execution side translation to be graph based at the time, because of the lost type information, which required additional graph context to build the execution side with the right types (eg for SDK side source, sink, and flatten handling). So I question if full symmetry is required. Eg. There's no reason for ExternalTransforms to be converted back on execution side, or for GBKs (usually that is, I'm looking at you Typescript SDK!). And conversely, there are "Execution Side Only" transforms that are never directly written by a pipeline or transform author, but are necessary to execute SDK side (combine or SDF components for example), even though those have single user side constructs. That just implies that the toProto and fromProto parts are separable though. But that's just that specific experimental design for that specific languages affordances. It's definitely a big plus to be able to see all the bits for a single transform in one file, instead of trying to find the 5-8 different places once must add a registration for it. More so in Java where such handler registrations can be done via class annotations! Robert Burke Beam Go Busybody On Thu, Feb 15, 2024, 10:37 AM Robert Bradshaw via dev <dev@beam.apache.org> wrote: > On Wed, Feb 14, 2024 at 10:28 AM Kenneth Knowles <k...@apache.org> wrote: > > > > Hi all, > > > > TL;DR I want to add some API like PTransform.getURN, toProto and > fromProto, etc. to the Java SDK. I want to do this so that making a > PTransform support portability is a natural part of writing the transform > and not a totally separate thing with tons of boilerplate. > > > > What do you think? > > Huge +1 to this direction. > > IMHO one of the most fundamental things about Beam is its model. > Originally this was only expressed in a specific SDK (Java) and then > got ported to others, but now that we have portability it's expressed > in a language-independent way. > > The fact that we keep these separate in Java is not buying us > anything, and causes a huge amount of boilerplate that'd be great to > remove, as well as making the essential model more front-and-center. > > > I think a particular API can be sorted out most easily in code (which I > will prepare after gathering some feedback). > > > > We already have all the translation logic written, and porting a couple > transforms to it will ensure the API has everything we need. We can refer > to Python and Go for API ideas as well. > > > > Lots of context below, but you can skip it... > > > > ----- > > > > When we first created the portability framework, we wanted the SDKs to > be "standalone" and not depend on portability. We wanted portability to be > an optional plugin that users could opt in to. That is totally the opposite > now. We want portability to be the main place where Beam is defined, and > then SDKs make that available in language idiomatic ways. > > > > Also when we first created the framework, we were experimenting with > different serialization approaches and we wanted to be independent of > protobuf and gRPC if we could. But now we are pretty committed and it would > be a huge lift to use anything else. > > > > Finally, at the time we created the portability framework, we designed > it to allow composites to have URNs and well-defined specs, rather than > just be language-specific subgraphs, but we didn't really plan to make this > easy. > > > > For all of the above, most users depend on portability and on proto. So > separating them is not useful and just creates LOTS of boilerplate and > friction for making new well-defined transforms. > > > > Kenn >