Hello!  I took a quick look -- I think there's some potential
confusion on this line[1] and the reflectData argument being passed
into the new constructor.

If I'm reading correctly, the argument passed in is never actually
used in the eventual ReflectDatumReader/Writer, and it's a different
type than the "this.reflectData" member in the instance.

To restore the original behaviour, I'd probably recommend just passing
in a boolean argument instead, something very explicit along the lines
of "useReflectionOnSpecificData", or "alwaysUseAvroReflect".  That's
also the reason to consider a very simple AvroReflectCoder.of(...)
instead of an AvroCoder.of(x, y, true) factory method for readability,
like what was done with AvroGenericCoder.

It would be easier to comment on a PR, don't hesitate!

All my best, Ryan

[1]https://github.com/apache/beam/compare/master...clairemcginty:avro_reflect_coder_option?expand=1#diff-e875a9933286d97dd3d3d21a61e6f11c0e35624e97411c1b98f1ac672c21045dR311


On Mon, Jul 26, 2021 at 6:42 PM Claire McGinty
<[email protected]> wrote:
>
> Thanks! I put up a branch with a possible solution for adding the Reflect 
> option to AvroCoder with as minimal a code change as possible [1] - would 
> love to get anyone's thoughts on this.
>
> - Claire
>
> On Wed, Jul 21, 2021 at 7:00 PM Ahmet Altay <[email protected]> wrote:
>>
>>
>>
>> On Wed, Jul 21, 2021 at 9:37 AM Claire McGinty <[email protected]> 
>> wrote:
>>>
>>> Hi Ahmet! Yes, I think it should be documented in the release notes.
>>
>>
>> Great. +Vitaly, do you want to add the breaking change to the release notes, 
>> since this was related your change.
>>
>>>
>>> What do you think of Ryan’s suggestion to add a ReflectAvroCoder or a 
>>> configuration option to the existing AvroCoder?
>>
>>
>> I am not sure I am the best person to answer this. Second option, of adding 
>> a configuration to the existing AvroCoder, rather than creating a new coder 
>> makes more sense to me.
>>
>> That said, people who might have an opinion: /cc @Ismaël Mejía @Kenneth 
>> Knowles @Lukasz Cwik +Vitaly
>>
>>>
>>>
>>> Thanks,
>>> Claire
>>>
>>> On Tue, Jul 20, 2021 at 4:15 PM Ahmet Altay <[email protected]> wrote:
>>>>
>>>> Is this something we need to add to the 2.30.0 release notes 
>>>> (https://beam.apache.org/blog/beam-2.30.0/) as a breaking change?
>>>>
>>>> On Fri, Jul 16, 2021 at 7:11 AM Ryan Skraba <[email protected]> wrote:
>>>>>
>>>>> Hello!  Good catch, I'm taking a look, but it looks like you're
>>>>> entirely correct and there isn't any obvious workaround.  I guess you
>>>>> could regenerate every SpecificRecord class in order to add the
>>>>> "java-class" or "avro.java.string" annotation, but that shouldn't be
>>>>> necessary.
>>>>>
>>>>> From the Avro perspective, we should always have been using
>>>>> SpecificDatumReader/Writer for all generated SpecificRecords...  We
>>>>> would still have the same Utf8 and .toString problems, but at least
>>>>> there would be no change in behaviour during migration :/
>>>>>
>>>>> As a side note, the Apache Avro project should probably reconsider
>>>>> whether the Utf8 class still adds any value with modern JVMs!  If I
>>>>> understand correctly, it was originally in place because Hadoop had a
>>>>> performance boost when it could reuse mutable data containers.
>>>>>
>>>>> Moving forward, I think your suggestion is the most pragmatic: either
>>>>> add a configuration option to AvroCoder to always drop to ReflectData,
>>>>> or explicitly provide a ReflectAvroCoder that only uses reflection.
>>>>>
>>>>> I took the liberty of creating the JIRA
>>>>> https://issues.apache.org/jira/browse/BEAM-12628 JIRA, so I could
>>>>> create an link an Avro issue!  Please feel free to update if I missed
>>>>> anything.
>>>>>
>>>>> Best regards, Ryan
>>>>>
>>>>> On Thu, Jul 15, 2021 at 10:53 PM Claire McGinty
>>>>> <[email protected]> wrote:
>>>>> >
>>>>> > Hi all,
>>>>> >
>>>>> > When upgrading from Beam 2.29.0 to 2.30.0, we encountered some 
>>>>> > unexpected runtime issues due to changes from BEAM-2303. This PR 
>>>>> > updated  AvroCoder to use SpecificDatum{Reader,Writer} instead 
>>>>> > ofReflectDatum{Reader,Writer} in its implementation.
>>>>> >
>>>>> > When using the Reflect* suite, Avro string fields have getters/setters 
>>>>> > defined with a CharSequence signature, but are by default decoded as 
>>>>> > java.lang.Strings [1]. But the Specific* suitehas a different default 
>>>>> > behavior for decoding Avro string fields: unless the Avro schema 
>>>>> > property "java-class" is set to "java.lang.String", the decoded 
>>>>> > CharSequences will by default be implemented as 
>>>>> > org.apache.avro.util.Utf8 objects [2].
>>>>> >
>>>>> > This is causing some migration pain for us as we're having to either 
>>>>> > add the java-class property to all string field schemas, or call 
>>>>> > .toString on a lot of fields we could just cast before. Additionally, 
>>>>> > Utf8 isn't Serializable and there's no default Coder representation for 
>>>>> > it. Beam's AvroSink/AvroSource still use the Reflect* reader/writer, as 
>>>>> > well.I created a quick Gist to demonstrate the issue: [3].
>>>>> >
>>>>> > I'm wondering if there's any possibility of making the use of Reflect* 
>>>>> > vs Specific* configurable in AvroCoder, or maybe setting a default 
>>>>> > String type in the coder constructor.  If not, maybe this change should 
>>>>> > be documented in the release notes?
>>>>> >
>>>>> > Thanks,
>>>>> > Claire

Reply via email to