Re: [YAML/Python] SchemadParDo

Reuven Lax via dev Sun, 11 May 2025 19:35:28 -0700

My first thought is that this should go in contrib for now.

BTW in the Java SDK, field access is integrated directly into ParDo. e.g.
you can write


new DoFn<> {
   @ProcessElement
   public void process(@FieldAccess("field1") Type1
field1, @FieldAccess("field2") Type2 field2) {
      ...
    }
}

It also supports selecting wildcards (e.g. @FieldAccess("top.*")).

I'm not sure how this pattern would translate into the Python SDK though.

On Sat, May 10, 2025 at 3:35 AM Joey Tran <joey.t...@schrodinger.com> wrote:

> Not currently
>
> On Sat, May 10, 2025, 12:48 AM Reuven Lax <re...@google.com> wrote:
>
>> Does this work with nested fields? Can you specify Input_field="a.b.c"?
>>
>> On Fri, May 9, 2025 at 7:18 PM Joey Tran <joey.t...@schrodinger.com>
>> wrote:
>>
>>> Sure!
>>>
>>> Given a DoFn that has...
>>>
>>> def process(self, sentence):
>>>     yield from sentence.split()
>>>
>>>
>>> You could use it with SchemadParDo as:
>>>
>>> (p | beam.Create([pvalue.Row(element="hello world", id="id")])
>>> | SchemadParDo(SchemadParDo(SplitSentenceDoFn(), input_field="element",
>>> output_field="word"))
>>>
>>> And it'd produce Row(word="hello", id="id") and Row(word=""world",
>>> id="id")
>>>
>>> On Fri, May 9, 2025, 9:57 PM Reuven Lax via dev <dev@beam.apache.org>
>>> wrote:
>>>
>>>> Can you explain a bit how SchemadParDo works?
>>>>
>>>> On Fri, May 9, 2025 at 4:49 PM Joey Tran <joey.t...@schrodinger.com>
>>>> wrote:
>>>>
>>>>> I've written a `SchemadParDo(input_field: str, output_field,
>>>>> dofn:DoFn)` transform for more easily writing a Schemad transform given a
>>>>> DoFn.
>>>>>
>>>>> Is this something worth upstreaming into the Beam Python SDK? I wrote
>>>>> it to make it easier to convert our current set of dofn's into
>>>>> schemad dofns for use with the YAML SDK. Just wanted to gauge interest
>>>>> before setting up the dev env again
>>>>>
>>>>

Re: [YAML/Python] SchemadParDo

Reply via email to