I can help review the Java change.

-Rui

On Thu, Aug 6, 2020 at 9:53 AM Brian Hulette <[email protected]> wrote:

> The PR for this is up now: https://github.com/apache/beam/pull/12481
> Any volunteers to help review? We may want a separate reviewer for Python
> and Java changes.
>
> Brian
>
> On Wed, Aug 5, 2020 at 9:00 AM Brian Hulette <[email protected]> wrote:
>
>> What I'm working on changes ExternalConfigurationPayload [1] to this:
>>
>> message ExternalConfigurationPayload {
>>   // A schema for use in beam:coder:row:v1
>>   Schema schema = 1;
>>
>>   // A payload which can be decoded using beam:coder:row:v1 and the given
>> schema.
>>   bytes payload = 2;
>> }
>>
>> The calling SDK can infer a schema from a user configuration type (in
>> Python we can make minor changes to SchemaBasedPayloadBuilder for this),
>> and use it's implementation of beam:coder:row:v1 to encode an instance of
>> that type to the payload.
>>
>> Similarly the expanding SDK can infer a schema from a user configuration
>> type and map the encoded row to an instance of the user type, assuming the
>> schemas are compatible.
>>
>> Brian
>>
>> [1]
>> https://github.com/apache/beam/blob/86b8326b4ebc4e217585847102743cc1d1af367a/model/pipeline/src/main/proto/external_transforms.proto#L42
>>
>> On Wed, Aug 5, 2020 at 2:04 AM Maximilian Michels <[email protected]> wrote:
>>
>>> +1
>>>
>>> The format to store coders is not set in stone, it was a first version
>>> to make external configuration work. Using the Coder message would be
>>> better.
>>>
>>> As for using Schema to store the configuration, could somebody fill me
>>> in how that would work?
>>>
>>> -Max
>>>
>>> On 04.08.20 02:01, Brian Hulette wrote:
>>> > I've opened BEAM-10571 [1] for this, and I'm most of the way to an
>>> > implementation now. Aiming to have it done before the 2.24.0 cut since
>>> > it will be the last release with python 2 support.
>>> >
>>> > [1] https://issues.apache.org/jira/browse/BEAM-10571
>>> >
>>> > On Wed, Jul 15, 2020 at 9:03 AM Chamikara Jayalath <
>>> [email protected]
>>> > <mailto:[email protected]>> wrote:
>>> >
>>> >
>>> >
>>> >     On Fri, Jul 10, 2020 at 4:47 PM Robert Bradshaw <
>>> [email protected]
>>> >     <mailto:[email protected]>> wrote:
>>> >
>>> >         On Fri, Jul 10, 2020 at 4:36 PM Brian Hulette
>>> >         <[email protected] <mailto:[email protected]>> wrote:
>>> >          >
>>> >          > Ah yes I'm +1 for that approach too - it would let us
>>> >         leverage all the schema-inference already in the Java SDK for
>>> >         translating configuration objects which would be great.
>>> >          > Things on the Python side would be trickier as schemas don't
>>> >         formally support all the types you can use in the
>>> PayloadBuilder
>>> >         implementations [1] yet, just NamedTuple. For now we could just
>>> >         make the PayloadBuilder implementations generate Rows without
>>> >         making that translation available for use in PCollections.
>>> >
>>> >
>>> >     This will be a good opportunity to add some sort of a minimal
>>> Python
>>> >     type to Beam schema mapping :)
>>> >
>>> >
>>> >         Yes, though eventually it might be nice to support all of these
>>> >         various types as schema'd PCollection elements as well.
>>> >
>>> >          > Do we need to worry about update compatibility for
>>> >         ExternalConfigurationPayload?
>>> >
>>> >         Technically, each URN defines their payload, and the fact that
>>> we've
>>> >         settled on ExternalConfigurationPayload is a convention. On a
>>> >         practical note, we haven't declared these protos stable yet. (I
>>> >         would
>>> >         like to do so before we drop support for Python 2, as external
>>> >         transforms are a possible escape hatch and the first strong
>>> >         motivation
>>> >         to have external transforms that span Beam versions).
>>> >
>>> >
>>> >     +1
>>> >
>>> >
>>> >          > [1]
>>> >
>>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/external.py
>>> >          >
>>> >          > On Fri, Jul 10, 2020 at 4:23 PM Robert Bradshaw
>>> >         <[email protected] <mailto:[email protected]>> wrote:
>>> >          >>
>>> >          >> I would be in favor of just using a schema to store the
>>> entire
>>> >          >> configuration. The reason we went with what we have to day
>>> >         is that we
>>> >          >> didn't have cross language schemas yet.
>>> >          >>
>>> >          >> On Fri, Jul 10, 2020 at 12:24 PM Brian Hulette
>>> >         <[email protected] <mailto:[email protected]>> wrote:
>>> >          >> >
>>> >          >> > Hi everyone,
>>> >          >> > I noticed that currently the ExternalConfigurationPayload
>>> >         uses a list of coder URNs to represent the coder that was used
>>> >         to serialize each configuration field [1]. This seems
>>> acceptable
>>> >         at first blush, but there's one notable issue: it has no place
>>> >         to store a payload for the coder. Most standard coders don't
>>> use
>>> >         a payload so it's not a problem, but row coder does use a
>>> >         payload to store it's schema, which means it can't be used in
>>> an
>>> >         ExternalConfigurationPayload today.
>>> >          >> >
>>> >          >> > Is there a reason not to just use the Coder message [2]
>>> in
>>> >         ExternalConfigurationPayload instead of a list of coder URNs?
>>> >         That would work with row coder, and it would also make it
>>> easier
>>> >         to re-use logic for translating Pipeline protos.
>>> >          >> >
>>> >          >> > I'd be happy to make this change, but I wanted to ask on
>>> >         dev@ in case there's something I'm missing here.
>>> >          >> >
>>> >          >> > Brian
>>> >          >> >
>>> >          >> > [1]
>>> >
>>> https://github.com/apache/beam/blob/c54a0b7f49f2eb4a15df115205e2fa455116ccbe/model/pipeline/src/main/proto/external_transforms.proto#L34-L35
>>> >          >> > [2]
>>> >
>>> https://github.com/apache/beam/blob/c54a0b7f49f2eb4a15df115205e2fa455116ccbe/model/pipeline/src/main/proto/beam_runner_api.proto#L542-L555
>>> >
>>>
>>

Reply via email to