Or perhaps the other way around? If you have a Schema we can auto-generate
the associated builder on the PTransform? Either way, more DRY.

On Tue, May 30, 2023 at 10:59 AM Robert Bradshaw via dev <
dev@beam.apache.org> wrote:

> +1 to this simplification, it's a historical artifact that provides no
> value.
>
> I would love it if we also looked into ways to auto-generate the
> SchemaTransformProvider (e.g. via introspection if a transform takes a
> small number of arguments, or uses the standard builder pattern...),
> ideally with something as simple as adding a decorator to the PTransform
> itself.
>
>
> On Tue, May 30, 2023 at 7:42 AM Ahmed Abualsaud via dev <
> dev@beam.apache.org> wrote:
>
>> Hey everyone,
>>
>> I was looking at how we use SchemaTransforms in our expansion service.
>> From what I see, there may be a redundant step in developing
>> SchemaTransforms. Currently, we have 3 pieces:
>> - SchemaTransformProvider [1]
>> - A configuration object
>> - SchemaTransform [2]
>>
>> The API is generally used like this:
>> 1. The SchemaTransformProvider takes a configuration object and returns a
>> SchemaTransform
>> 2. The SchemaTransform is used to build a PTransform according to the
>> configuration
>>
>> In these steps, the SchemaTransform class seems unnecessary. We can
>> combine the two steps if we have SchemaTransformProvider return the
>> PTransform directly.
>>
>> We can then remove the SchemaTransform class as it will be obsolete. This
>> should be safe to do; the only place it's used in our API is here [3], and
>> that can be simplified if we make this change (we'd just trim `
>> .buildTransform()` off the end as `provider.from(configRow)` will
>> directly return the PTransform).
>>
>> I'd like to first mention that I was not involved in the design process
>> of this API so I may be missing some information on why it was set up this
>> way.
>>
>> A few developers already raised questions about how there's seemingly
>> unnecessary boilerplate involved in making a Java transform portable. I
>> wasn't involved in the design process of this API so I may be missing some
>> information, but my assumption is this was designed to follow the pattern
>> of the previous iteration of this API (SchemaIO): SchemaIOProvider[4] ->
>> SchemaIO[5] -> PTransform. However, with the newer
>> SchemaTransformProvider API, we dropped a few methods and reduced the
>> SchemaTransform class to have a generic buildTransform() method. See the
>> example of PubsubReadSchemaTransformProvider [6], where the
>> SchemaTransform interface and buildTransform method are implemented just
>> to satisfy the requirement that SchemaTransformProvider::from return a
>> SchemaTransform.
>>
>> I'm bringing this up because if we are looking to encourage contribution
>> to cross-language use cases, we should make it simpler and less convoluted
>> to develop portable transforms.
>>
>> There are a number of SchemaTransforms already developed, but applying
>> these changes to them should be straightforward. If people think this is a
>> good idea, I can open a PR and implement them.
>>
>> Best,
>> Ahmed
>>
>> [1]
>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/SchemaTransformProvider.java
>> [2]
>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/SchemaTransform.java
>> [3]
>> https://github.com/apache/beam/blob/d7ded3f07064919c202c81a2c786910e20a834f9/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionServiceSchemaTransformProvider.java#L138
>> [4]
>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/io/SchemaIOProvider.java
>> [5]
>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/io/SchemaIO.java
>> [6]
>> https://github.com/apache/beam/blob/ed1a297904d5f5c743a6aca1a7648e3fb8f02e18/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubReadSchemaTransformProvider.java#L133-L137
>>
>

Reply via email to