Right; since ReflectData has just the one instance per classloader using it in the API is purely stylistic to match the other signatures taking in Class<T>/Schema :) open to any changes, the Boolean flag option is probably clearer I can do whichever option is most in line with Beam style!
-Claire On Tue, Jul 27, 2021 at 5:47 AM Ryan Skraba <[email protected]> wrote: > Hello! I took a quick look -- I think there's some potential > confusion on this line[1] and the reflectData argument being passed > into the new constructor. > > If I'm reading correctly, the argument passed in is never actually > used in the eventual ReflectDatumReader/Writer, and it's a different > type than the "this.reflectData" member in the instance. > > To restore the original behaviour, I'd probably recommend just passing > in a boolean argument instead, something very explicit along the lines > of "useReflectionOnSpecificData", or "alwaysUseAvroReflect". That's > also the reason to consider a very simple AvroReflectCoder.of(...) > instead of an AvroCoder.of(x, y, true) factory method for readability, > like what was done with AvroGenericCoder. > > It would be easier to comment on a PR, don't hesitate! > > All my best, Ryan > > [1] > https://github.com/apache/beam/compare/master...clairemcginty:avro_reflect_coder_option?expand=1#diff-e875a9933286d97dd3d3d21a61e6f11c0e35624e97411c1b98f1ac672c21045dR311 > > > On Mon, Jul 26, 2021 at 6:42 PM Claire McGinty > <[email protected]> wrote: > > > > Thanks! I put up a branch with a possible solution for adding the > Reflect option to AvroCoder with as minimal a code change as possible [1] - > would love to get anyone's thoughts on this. > > > > - Claire > > > > On Wed, Jul 21, 2021 at 7:00 PM Ahmet Altay <[email protected]> wrote: > >> > >> > >> > >> On Wed, Jul 21, 2021 at 9:37 AM Claire McGinty < > [email protected]> wrote: > >>> > >>> Hi Ahmet! Yes, I think it should be documented in the release notes. > >> > >> > >> Great. +Vitaly, do you want to add the breaking change to the release > notes, since this was related your change. > >> > >>> > >>> What do you think of Ryan’s suggestion to add a ReflectAvroCoder or a > configuration option to the existing AvroCoder? > >> > >> > >> I am not sure I am the best person to answer this. Second option, of > adding a configuration to the existing AvroCoder, rather than creating a > new coder makes more sense to me. > >> > >> That said, people who might have an opinion: /cc @Ismaël Mejía @Kenneth > Knowles @Lukasz Cwik +Vitaly > >> > >>> > >>> > >>> Thanks, > >>> Claire > >>> > >>> On Tue, Jul 20, 2021 at 4:15 PM Ahmet Altay <[email protected]> wrote: > >>>> > >>>> Is this something we need to add to the 2.30.0 release notes ( > https://beam.apache.org/blog/beam-2.30.0/) as a breaking change? > >>>> > >>>> On Fri, Jul 16, 2021 at 7:11 AM Ryan Skraba <[email protected]> wrote: > >>>>> > >>>>> Hello! Good catch, I'm taking a look, but it looks like you're > >>>>> entirely correct and there isn't any obvious workaround. I guess you > >>>>> could regenerate every SpecificRecord class in order to add the > >>>>> "java-class" or "avro.java.string" annotation, but that shouldn't be > >>>>> necessary. > >>>>> > >>>>> From the Avro perspective, we should always have been using > >>>>> SpecificDatumReader/Writer for all generated SpecificRecords... We > >>>>> would still have the same Utf8 and .toString problems, but at least > >>>>> there would be no change in behaviour during migration :/ > >>>>> > >>>>> As a side note, the Apache Avro project should probably reconsider > >>>>> whether the Utf8 class still adds any value with modern JVMs! If I > >>>>> understand correctly, it was originally in place because Hadoop had a > >>>>> performance boost when it could reuse mutable data containers. > >>>>> > >>>>> Moving forward, I think your suggestion is the most pragmatic: either > >>>>> add a configuration option to AvroCoder to always drop to > ReflectData, > >>>>> or explicitly provide a ReflectAvroCoder that only uses reflection. > >>>>> > >>>>> I took the liberty of creating the JIRA > >>>>> https://issues.apache.org/jira/browse/BEAM-12628 JIRA, so I could > >>>>> create an link an Avro issue! Please feel free to update if I missed > >>>>> anything. > >>>>> > >>>>> Best regards, Ryan > >>>>> > >>>>> On Thu, Jul 15, 2021 at 10:53 PM Claire McGinty > >>>>> <[email protected]> wrote: > >>>>> > > >>>>> > Hi all, > >>>>> > > >>>>> > When upgrading from Beam 2.29.0 to 2.30.0, we encountered some > unexpected runtime issues due to changes from BEAM-2303. This PR updated > AvroCoder to use SpecificDatum{Reader,Writer} instead > ofReflectDatum{Reader,Writer} in its implementation. > >>>>> > > >>>>> > When using the Reflect* suite, Avro string fields have > getters/setters defined with a CharSequence signature, but are by default > decoded as java.lang.Strings [1]. But the Specific* suitehas a different > default behavior for decoding Avro string fields: unless the Avro schema > property "java-class" is set to "java.lang.String", the decoded > CharSequences will by default be implemented as org.apache.avro.util.Utf8 > objects [2]. > >>>>> > > >>>>> > This is causing some migration pain for us as we're having to > either add the java-class property to all string field schemas, or call > .toString on a lot of fields we could just cast before. Additionally, Utf8 > isn't Serializable and there's no default Coder representation for it. > Beam's AvroSink/AvroSource still use the Reflect* reader/writer, as well.I > created a quick Gist to demonstrate the issue: [3]. > >>>>> > > >>>>> > I'm wondering if there's any possibility of making the use of > Reflect* vs Specific* configurable in AvroCoder, or maybe setting a default > String type in the coder constructor. If not, maybe this change should be > documented in the release notes? > >>>>> > > >>>>> > Thanks, > >>>>> > Claire >
