My first thought is that this should go in contrib for now.

BTW in the Java SDK, field access is integrated directly into ParDo. e.g.
you can write

new DoFn<> {
   @ProcessElement
   public void process(@FieldAccess("field1") Type1
field1, @FieldAccess("field2") Type2 field2) {
      ...
    }
}

It also supports selecting wildcards (e.g. @FieldAccess("top.*")).

I'm not sure how this pattern would translate into the Python SDK though.

On Sat, May 10, 2025 at 3:35 AM Joey Tran <joey.t...@schrodinger.com> wrote:

> Not currently
>
> On Sat, May 10, 2025, 12:48 AM Reuven Lax <re...@google.com> wrote:
>
>> Does this work with nested fields? Can you specify Input_field="a.b.c"?
>>
>> On Fri, May 9, 2025 at 7:18 PM Joey Tran <joey.t...@schrodinger.com>
>> wrote:
>>
>>> Sure!
>>>
>>> Given a DoFn that has...
>>>
>>> def process(self, sentence):
>>>     yield from sentence.split()
>>>
>>>
>>> You could use it with SchemadParDo as:
>>>
>>> (p | beam.Create([pvalue.Row(element="hello world", id="id")])
>>> | SchemadParDo(SchemadParDo(SplitSentenceDoFn(), input_field="element",
>>> output_field="word"))
>>>
>>> And it'd produce Row(word="hello", id="id") and Row(word=""world",
>>> id="id")
>>>
>>> On Fri, May 9, 2025, 9:57 PM Reuven Lax via dev <dev@beam.apache.org>
>>> wrote:
>>>
>>>> Can you explain a bit how SchemadParDo works?
>>>>
>>>> On Fri, May 9, 2025 at 4:49 PM Joey Tran <joey.t...@schrodinger.com>
>>>> wrote:
>>>>
>>>>> I've written a `SchemadParDo(input_field: str, output_field,
>>>>> dofn:DoFn)` transform for more easily writing a Schemad transform given a
>>>>> DoFn.
>>>>>
>>>>> Is this something worth upstreaming into the Beam Python SDK? I wrote
>>>>> it to make it easier to convert our current set of dofn's into
>>>>> schemad dofns for use with the YAML SDK. Just wanted to gauge interest
>>>>> before setting up the dev env again
>>>>>
>>>>

Reply via email to