On Wed, Feb 14, 2024 at 10:28 AM Kenneth Knowles <k...@apache.org> wrote:
>
> Hi all,
>
> TL;DR I want to add some API like PTransform.getURN, toProto and fromProto, 
> etc. to the Java SDK. I want to do this so that making a PTransform support 
> portability is a natural part of writing the transform and not a totally 
> separate thing with tons of boilerplate.
>
> What do you think?

Huge +1 to this direction.

IMHO one of the most fundamental things about Beam is its model.
Originally this was only expressed in a specific SDK (Java) and then
got ported to others, but now that we have portability it's expressed
in a language-independent way.

The fact that we keep these separate in Java is not buying us
anything, and causes a huge amount of boilerplate that'd be great to
remove, as well as making the essential model more front-and-center.

> I think a particular API can be sorted out most easily in code (which I will 
> prepare after gathering some feedback).
>
> We already have all the translation logic written, and porting a couple 
> transforms to it will ensure the API has everything we need. We can refer to 
> Python and Go for API ideas as well.
>
> Lots of context below, but you can skip it...
>
> -----
>
> When we first created the portability framework, we wanted the SDKs to be 
> "standalone" and not depend on portability. We wanted portability to be an 
> optional plugin that users could opt in to. That is totally the opposite now. 
> We want portability to be the main place where Beam is defined, and then SDKs 
> make that available in language idiomatic ways.
>
> Also when we first created the framework, we were experimenting with 
> different serialization approaches and we wanted to be independent of 
> protobuf and gRPC if we could. But now we are pretty committed and it would 
> be a huge lift to use anything else.
>
> Finally, at the time we created the portability framework, we designed it to 
> allow composites to have URNs and well-defined specs, rather than just be 
> language-specific subgraphs, but we didn't really plan to make this easy.
>
> For all of the above, most users depend on portability and on proto. So 
> separating them is not useful and just creates LOTS of boilerplate and 
> friction for making new well-defined transforms.
>
> Kenn

Reply via email to