On Fri, Jul 10, 2020 at 4:36 PM Brian Hulette <bhule...@google.com> wrote:
>
> Ah yes I'm +1 for that approach too - it would let us leverage all the 
> schema-inference already in the Java SDK for translating configuration 
> objects which would be great.
> Things on the Python side would be trickier as schemas don't formally support 
> all the types you can use in the PayloadBuilder implementations [1] yet, just 
> NamedTuple. For now we could just make the PayloadBuilder implementations 
> generate Rows without making that translation available for use in 
> PCollections.

Yes, though eventually it might be nice to support all of these
various types as schema'd PCollection elements as well.

> Do we need to worry about update compatibility for 
> ExternalConfigurationPayload?

Technically, each URN defines their payload, and the fact that we've
settled on ExternalConfigurationPayload is a convention. On a
practical note, we haven't declared these protos stable yet. (I would
like to do so before we drop support for Python 2, as external
transforms are a possible escape hatch and the first strong motivation
to have external transforms that span Beam versions).

> [1] 
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/external.py
>
> On Fri, Jul 10, 2020 at 4:23 PM Robert Bradshaw <rober...@google.com> wrote:
>>
>> I would be in favor of just using a schema to store the entire
>> configuration. The reason we went with what we have to day is that we
>> didn't have cross language schemas yet.
>>
>> On Fri, Jul 10, 2020 at 12:24 PM Brian Hulette <bhule...@google.com> wrote:
>> >
>> > Hi everyone,
>> > I noticed that currently the ExternalConfigurationPayload uses a list of 
>> > coder URNs to represent the coder that was used to serialize each 
>> > configuration field [1]. This seems acceptable at first blush, but there's 
>> > one notable issue: it has no place to store a payload for the coder. Most 
>> > standard coders don't use a payload so it's not a problem, but row coder 
>> > does use a payload to store it's schema, which means it can't be used in 
>> > an ExternalConfigurationPayload today.
>> >
>> > Is there a reason not to just use the Coder message [2] in 
>> > ExternalConfigurationPayload instead of a list of coder URNs? That would 
>> > work with row coder, and it would also make it easier to re-use logic for 
>> > translating Pipeline protos.
>> >
>> > I'd be happy to make this change, but I wanted to ask on dev@ in case 
>> > there's something I'm missing here.
>> >
>> > Brian
>> >
>> > [1] 
>> > https://github.com/apache/beam/blob/c54a0b7f49f2eb4a15df115205e2fa455116ccbe/model/pipeline/src/main/proto/external_transforms.proto#L34-L35
>> > [2] 
>> > https://github.com/apache/beam/blob/c54a0b7f49f2eb4a15df115205e2fa455116ccbe/model/pipeline/src/main/proto/beam_runner_api.proto#L542-L555

Reply via email to