+1 for the simplification.

On Tue, May 30, 2023 at 12:33 PM Robert Bradshaw <rober...@google.com>
wrote:

> Yeah. Essentially one needs do (1) name the arguments and (2) implement
> the transform. Hopefully (1) could be done in a concise way that allows for
> easy consumption from both Java and cross-language.
>

+1 but I think the hard part today is to convert existing PTransforms to be
schema-aware transform compatible (for example, change input/output types
and make sure parameters take Beam Schema compatible types). But this makes
sense for new transforms.



> On Tue, May 30, 2023 at 12:25 PM Byron Ellis <byronel...@google.com>
> wrote:
>
>> Or perhaps the other way around? If you have a Schema we can
>> auto-generate the associated builder on the PTransform? Either way, more
>> DRY.
>>
>> On Tue, May 30, 2023 at 10:59 AM Robert Bradshaw via dev <
>> dev@beam.apache.org> wrote:
>>
>>> +1 to this simplification, it's a historical artifact that provides no
>>> value.
>>>
>>> I would love it if we also looked into ways to auto-generate the
>>> SchemaTransformProvider (e.g. via introspection if a transform takes a
>>> small number of arguments, or uses the standard builder pattern...),
>>> ideally with something as simple as adding a decorator to the PTransform
>>> itself.
>>>
>>>
>>> On Tue, May 30, 2023 at 7:42 AM Ahmed Abualsaud via dev <
>>> dev@beam.apache.org> wrote:
>>>
>>>> Hey everyone,
>>>>
>>>> I was looking at how we use SchemaTransforms in our expansion service.
>>>> From what I see, there may be a redundant step in developing
>>>> SchemaTransforms. Currently, we have 3 pieces:
>>>> - SchemaTransformProvider [1]
>>>> - A configuration object
>>>> - SchemaTransform [2]
>>>>
>>>> The API is generally used like this:
>>>> 1. The SchemaTransformProvider takes a configuration object and returns
>>>> a SchemaTransform
>>>> 2. The SchemaTransform is used to build a PTransform according to the
>>>> configuration
>>>>
>>>> In these steps, the SchemaTransform class seems unnecessary. We can
>>>> combine the two steps if we have SchemaTransformProvider return the
>>>> PTransform directly.
>>>>
>>>> We can then remove the SchemaTransform class as it will be obsolete.
>>>> This should be safe to do; the only place it's used in our API is here [3],
>>>> and that can be simplified if we make this change (we'd just trim `
>>>> .buildTransform()` off the end as `provider.from(configRow)` will
>>>> directly return the PTransform).
>>>>
>>>> I'd like to first mention that I was not involved in the design process
>>>> of this API so I may be missing some information on why it was set up this
>>>> way.
>>>>
>>>> A few developers already raised questions about how there's seemingly
>>>> unnecessary boilerplate involved in making a Java transform portable. I
>>>> wasn't involved in the design process of this API so I may be missing some
>>>> information, but my assumption is this was designed to follow the pattern
>>>> of the previous iteration of this API (SchemaIO): SchemaIOProvider[4] ->
>>>> SchemaIO[5] -> PTransform. However, with the newer
>>>> SchemaTransformProvider API, we dropped a few methods and reduced the
>>>> SchemaTransform class to have a generic buildTransform() method. See the
>>>> example of PubsubReadSchemaTransformProvider [6], where the
>>>> SchemaTransform interface and buildTransform method are implemented
>>>> just to satisfy the requirement that SchemaTransformProvider::from
>>>> return a SchemaTransform.
>>>>
>>>> I'm bringing this up because if we are looking to encourage
>>>> contribution to cross-language use cases, we should make it simpler and
>>>> less convoluted to develop portable transforms.
>>>>
>>>> There are a number of SchemaTransforms already developed, but applying
>>>> these changes to them should be straightforward. If people think this is a
>>>> good idea, I can open a PR and implement them.
>>>>
>>>> Best,
>>>> Ahmed
>>>>
>>>> [1]
>>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/SchemaTransformProvider.java
>>>> [2]
>>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/SchemaTransform.java
>>>> [3]
>>>> https://github.com/apache/beam/blob/d7ded3f07064919c202c81a2c786910e20a834f9/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionServiceSchemaTransformProvider.java#L138
>>>> [4]
>>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/io/SchemaIOProvider.java
>>>> [5]
>>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/io/SchemaIO.java
>>>> [6]
>>>> https://github.com/apache/beam/blob/ed1a297904d5f5c743a6aca1a7648e3fb8f02e18/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubReadSchemaTransformProvider.java#L133-L137
>>>>
>>>

Reply via email to