Hello! I took a quick look -- I think there's some potential confusion on this line[1] and the reflectData argument being passed into the new constructor.
If I'm reading correctly, the argument passed in is never actually used in the eventual ReflectDatumReader/Writer, and it's a different type than the "this.reflectData" member in the instance. To restore the original behaviour, I'd probably recommend just passing in a boolean argument instead, something very explicit along the lines of "useReflectionOnSpecificData", or "alwaysUseAvroReflect". That's also the reason to consider a very simple AvroReflectCoder.of(...) instead of an AvroCoder.of(x, y, true) factory method for readability, like what was done with AvroGenericCoder. It would be easier to comment on a PR, don't hesitate! All my best, Ryan [1]https://github.com/apache/beam/compare/master...clairemcginty:avro_reflect_coder_option?expand=1#diff-e875a9933286d97dd3d3d21a61e6f11c0e35624e97411c1b98f1ac672c21045dR311 On Mon, Jul 26, 2021 at 6:42 PM Claire McGinty <[email protected]> wrote: > > Thanks! I put up a branch with a possible solution for adding the Reflect > option to AvroCoder with as minimal a code change as possible [1] - would > love to get anyone's thoughts on this. > > - Claire > > On Wed, Jul 21, 2021 at 7:00 PM Ahmet Altay <[email protected]> wrote: >> >> >> >> On Wed, Jul 21, 2021 at 9:37 AM Claire McGinty <[email protected]> >> wrote: >>> >>> Hi Ahmet! Yes, I think it should be documented in the release notes. >> >> >> Great. +Vitaly, do you want to add the breaking change to the release notes, >> since this was related your change. >> >>> >>> What do you think of Ryan’s suggestion to add a ReflectAvroCoder or a >>> configuration option to the existing AvroCoder? >> >> >> I am not sure I am the best person to answer this. Second option, of adding >> a configuration to the existing AvroCoder, rather than creating a new coder >> makes more sense to me. >> >> That said, people who might have an opinion: /cc @Ismaël Mejía @Kenneth >> Knowles @Lukasz Cwik +Vitaly >> >>> >>> >>> Thanks, >>> Claire >>> >>> On Tue, Jul 20, 2021 at 4:15 PM Ahmet Altay <[email protected]> wrote: >>>> >>>> Is this something we need to add to the 2.30.0 release notes >>>> (https://beam.apache.org/blog/beam-2.30.0/) as a breaking change? >>>> >>>> On Fri, Jul 16, 2021 at 7:11 AM Ryan Skraba <[email protected]> wrote: >>>>> >>>>> Hello! Good catch, I'm taking a look, but it looks like you're >>>>> entirely correct and there isn't any obvious workaround. I guess you >>>>> could regenerate every SpecificRecord class in order to add the >>>>> "java-class" or "avro.java.string" annotation, but that shouldn't be >>>>> necessary. >>>>> >>>>> From the Avro perspective, we should always have been using >>>>> SpecificDatumReader/Writer for all generated SpecificRecords... We >>>>> would still have the same Utf8 and .toString problems, but at least >>>>> there would be no change in behaviour during migration :/ >>>>> >>>>> As a side note, the Apache Avro project should probably reconsider >>>>> whether the Utf8 class still adds any value with modern JVMs! If I >>>>> understand correctly, it was originally in place because Hadoop had a >>>>> performance boost when it could reuse mutable data containers. >>>>> >>>>> Moving forward, I think your suggestion is the most pragmatic: either >>>>> add a configuration option to AvroCoder to always drop to ReflectData, >>>>> or explicitly provide a ReflectAvroCoder that only uses reflection. >>>>> >>>>> I took the liberty of creating the JIRA >>>>> https://issues.apache.org/jira/browse/BEAM-12628 JIRA, so I could >>>>> create an link an Avro issue! Please feel free to update if I missed >>>>> anything. >>>>> >>>>> Best regards, Ryan >>>>> >>>>> On Thu, Jul 15, 2021 at 10:53 PM Claire McGinty >>>>> <[email protected]> wrote: >>>>> > >>>>> > Hi all, >>>>> > >>>>> > When upgrading from Beam 2.29.0 to 2.30.0, we encountered some >>>>> > unexpected runtime issues due to changes from BEAM-2303. This PR >>>>> > updated AvroCoder to use SpecificDatum{Reader,Writer} instead >>>>> > ofReflectDatum{Reader,Writer} in its implementation. >>>>> > >>>>> > When using the Reflect* suite, Avro string fields have getters/setters >>>>> > defined with a CharSequence signature, but are by default decoded as >>>>> > java.lang.Strings [1]. But the Specific* suitehas a different default >>>>> > behavior for decoding Avro string fields: unless the Avro schema >>>>> > property "java-class" is set to "java.lang.String", the decoded >>>>> > CharSequences will by default be implemented as >>>>> > org.apache.avro.util.Utf8 objects [2]. >>>>> > >>>>> > This is causing some migration pain for us as we're having to either >>>>> > add the java-class property to all string field schemas, or call >>>>> > .toString on a lot of fields we could just cast before. Additionally, >>>>> > Utf8 isn't Serializable and there's no default Coder representation for >>>>> > it. Beam's AvroSink/AvroSource still use the Reflect* reader/writer, as >>>>> > well.I created a quick Gist to demonstrate the issue: [3]. >>>>> > >>>>> > I'm wondering if there's any possibility of making the use of Reflect* >>>>> > vs Specific* configurable in AvroCoder, or maybe setting a default >>>>> > String type in the coder constructor. If not, maybe this change should >>>>> > be documented in the release notes? >>>>> > >>>>> > Thanks, >>>>> > Claire
