+1 for the simplification. On Tue, May 30, 2023 at 12:33 PM Robert Bradshaw <rober...@google.com> wrote:
> Yeah. Essentially one needs do (1) name the arguments and (2) implement > the transform. Hopefully (1) could be done in a concise way that allows for > easy consumption from both Java and cross-language. > +1 but I think the hard part today is to convert existing PTransforms to be schema-aware transform compatible (for example, change input/output types and make sure parameters take Beam Schema compatible types). But this makes sense for new transforms. > On Tue, May 30, 2023 at 12:25 PM Byron Ellis <byronel...@google.com> > wrote: > >> Or perhaps the other way around? If you have a Schema we can >> auto-generate the associated builder on the PTransform? Either way, more >> DRY. >> >> On Tue, May 30, 2023 at 10:59 AM Robert Bradshaw via dev < >> dev@beam.apache.org> wrote: >> >>> +1 to this simplification, it's a historical artifact that provides no >>> value. >>> >>> I would love it if we also looked into ways to auto-generate the >>> SchemaTransformProvider (e.g. via introspection if a transform takes a >>> small number of arguments, or uses the standard builder pattern...), >>> ideally with something as simple as adding a decorator to the PTransform >>> itself. >>> >>> >>> On Tue, May 30, 2023 at 7:42 AM Ahmed Abualsaud via dev < >>> dev@beam.apache.org> wrote: >>> >>>> Hey everyone, >>>> >>>> I was looking at how we use SchemaTransforms in our expansion service. >>>> From what I see, there may be a redundant step in developing >>>> SchemaTransforms. Currently, we have 3 pieces: >>>> - SchemaTransformProvider [1] >>>> - A configuration object >>>> - SchemaTransform [2] >>>> >>>> The API is generally used like this: >>>> 1. The SchemaTransformProvider takes a configuration object and returns >>>> a SchemaTransform >>>> 2. The SchemaTransform is used to build a PTransform according to the >>>> configuration >>>> >>>> In these steps, the SchemaTransform class seems unnecessary. We can >>>> combine the two steps if we have SchemaTransformProvider return the >>>> PTransform directly. >>>> >>>> We can then remove the SchemaTransform class as it will be obsolete. >>>> This should be safe to do; the only place it's used in our API is here [3], >>>> and that can be simplified if we make this change (we'd just trim ` >>>> .buildTransform()` off the end as `provider.from(configRow)` will >>>> directly return the PTransform). >>>> >>>> I'd like to first mention that I was not involved in the design process >>>> of this API so I may be missing some information on why it was set up this >>>> way. >>>> >>>> A few developers already raised questions about how there's seemingly >>>> unnecessary boilerplate involved in making a Java transform portable. I >>>> wasn't involved in the design process of this API so I may be missing some >>>> information, but my assumption is this was designed to follow the pattern >>>> of the previous iteration of this API (SchemaIO): SchemaIOProvider[4] -> >>>> SchemaIO[5] -> PTransform. However, with the newer >>>> SchemaTransformProvider API, we dropped a few methods and reduced the >>>> SchemaTransform class to have a generic buildTransform() method. See the >>>> example of PubsubReadSchemaTransformProvider [6], where the >>>> SchemaTransform interface and buildTransform method are implemented >>>> just to satisfy the requirement that SchemaTransformProvider::from >>>> return a SchemaTransform. >>>> >>>> I'm bringing this up because if we are looking to encourage >>>> contribution to cross-language use cases, we should make it simpler and >>>> less convoluted to develop portable transforms. >>>> >>>> There are a number of SchemaTransforms already developed, but applying >>>> these changes to them should be straightforward. If people think this is a >>>> good idea, I can open a PR and implement them. >>>> >>>> Best, >>>> Ahmed >>>> >>>> [1] >>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/SchemaTransformProvider.java >>>> [2] >>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/SchemaTransform.java >>>> [3] >>>> https://github.com/apache/beam/blob/d7ded3f07064919c202c81a2c786910e20a834f9/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionServiceSchemaTransformProvider.java#L138 >>>> [4] >>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/io/SchemaIOProvider.java >>>> [5] >>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/io/SchemaIO.java >>>> [6] >>>> https://github.com/apache/beam/blob/ed1a297904d5f5c743a6aca1a7648e3fb8f02e18/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubReadSchemaTransformProvider.java#L133-L137 >>>> >>>