Claire and I discussed this a bit earlier, and I asked her to submit a PR. It is found: https://github.com/apache/beam/pull/15292
On Wed, Aug 4, 2021 at 2:21 PM Kirill Panarin <[email protected]> wrote: > +1 to what Claire is proposing. Would be nice to have this merged to avoid > fixing things on our side! > > On 2021/08/02 14:58:11, Claire McGinty <[email protected]> > wrote: > > hi, I wanted to ping this thread again! It would be really helpful to > know > > whether this is something that can be eventually fixed in Beam, or > whether > > we'll have to make the changes on our end. > > > > - Claire > > > > On Tue, Jul 27, 2021 at 8:06 AM Claire McGinty < > [email protected]> > > wrote: > > > > > Right; since ReflectData has just the one instance per classloader > using > > > it in the API is purely stylistic to match the other signatures taking > in > > > Class<T>/Schema :) open to any changes, the Boolean flag option is > probably > > > clearer I can do whichever option is most in line with Beam style! > > > > > > -Claire > > > > > > On Tue, Jul 27, 2021 at 5:47 AM Ryan Skraba <[email protected]> wrote: > > > > > >> Hello! I took a quick look -- I think there's some potential > > >> confusion on this line[1] and the reflectData argument being passed > > >> into the new constructor. > > >> > > >> If I'm reading correctly, the argument passed in is never actually > > >> used in the eventual ReflectDatumReader/Writer, and it's a different > > >> type than the "this.reflectData" member in the instance. > > >> > > >> To restore the original behaviour, I'd probably recommend just passing > > >> in a boolean argument instead, something very explicit along the lines > > >> of "useReflectionOnSpecificData", or "alwaysUseAvroReflect". That's > > >> also the reason to consider a very simple AvroReflectCoder.of(...) > > >> instead of an AvroCoder.of(x, y, true) factory method for readability, > > >> like what was done with AvroGenericCoder. > > >> > > >> It would be easier to comment on a PR, don't hesitate! > > >> > > >> All my best, Ryan > > >> > > >> [1] > > >> > https://github.com/apache/beam/compare/master...clairemcginty:avro_reflect_coder_option?expand=1#diff-e875a9933286d97dd3d3d21a61e6f11c0e35624e97411c1b98f1ac672c21045dR311 > > >> > > >> > > >> On Mon, Jul 26, 2021 at 6:42 PM Claire McGinty > > >> <[email protected]> wrote: > > >> > > > >> > Thanks! I put up a branch with a possible solution for adding the > > >> Reflect option to AvroCoder with as minimal a code change as possible > [1] - > > >> would love to get anyone's thoughts on this. > > >> > > > >> > - Claire > > >> > > > >> > On Wed, Jul 21, 2021 at 7:00 PM Ahmet Altay <[email protected]> > wrote: > > >> >> > > >> >> > > >> >> > > >> >> On Wed, Jul 21, 2021 at 9:37 AM Claire McGinty < > > >> [email protected]> wrote: > > >> >>> > > >> >>> Hi Ahmet! Yes, I think it should be documented in the release > notes. > > >> >> > > >> >> > > >> >> Great. +Vitaly, do you want to add the breaking change to the > release > > >> notes, since this was related your change. > > >> >> > > >> >>> > > >> >>> What do you think of Ryan’s suggestion to add a ReflectAvroCoder > or a > > >> configuration option to the existing AvroCoder? > > >> >> > > >> >> > > >> >> I am not sure I am the best person to answer this. Second option, > of > > >> adding a configuration to the existing AvroCoder, rather than > creating a > > >> new coder makes more sense to me. > > >> >> > > >> >> That said, people who might have an opinion: /cc @Ismaël Mejía > > >> @Kenneth Knowles @Lukasz Cwik +Vitaly > > >> >> > > >> >>> > > >> >>> > > >> >>> Thanks, > > >> >>> Claire > > >> >>> > > >> >>> On Tue, Jul 20, 2021 at 4:15 PM Ahmet Altay <[email protected]> > wrote: > > >> >>>> > > >> >>>> Is this something we need to add to the 2.30.0 release notes ( > > >> https://beam.apache.org/blog/beam-2.30.0/) as a breaking change? > > >> >>>> > > >> >>>> On Fri, Jul 16, 2021 at 7:11 AM Ryan Skraba <[email protected]> > wrote: > > >> >>>>> > > >> >>>>> Hello! Good catch, I'm taking a look, but it looks like you're > > >> >>>>> entirely correct and there isn't any obvious workaround. I > guess > > >> you > > >> >>>>> could regenerate every SpecificRecord class in order to add the > > >> >>>>> "java-class" or "avro.java.string" annotation, but that > shouldn't be > > >> >>>>> necessary. > > >> >>>>> > > >> >>>>> From the Avro perspective, we should always have been using > > >> >>>>> SpecificDatumReader/Writer for all generated > SpecificRecords... We > > >> >>>>> would still have the same Utf8 and .toString problems, but at > least > > >> >>>>> there would be no change in behaviour during migration :/ > > >> >>>>> > > >> >>>>> As a side note, the Apache Avro project should probably > reconsider > > >> >>>>> whether the Utf8 class still adds any value with modern JVMs! > If I > > >> >>>>> understand correctly, it was originally in place because Hadoop > had > > >> a > > >> >>>>> performance boost when it could reuse mutable data containers. > > >> >>>>> > > >> >>>>> Moving forward, I think your suggestion is the most pragmatic: > > >> either > > >> >>>>> add a configuration option to AvroCoder to always drop to > > >> ReflectData, > > >> >>>>> or explicitly provide a ReflectAvroCoder that only uses > reflection. > > >> >>>>> > > >> >>>>> I took the liberty of creating the JIRA > > >> >>>>> https://issues.apache.org/jira/browse/BEAM-12628 JIRA, so I > could > > >> >>>>> create an link an Avro issue! Please feel free to update if I > > >> missed > > >> >>>>> anything. > > >> >>>>> > > >> >>>>> Best regards, Ryan > > >> >>>>> > > >> >>>>> On Thu, Jul 15, 2021 at 10:53 PM Claire McGinty > > >> >>>>> <[email protected]> wrote: > > >> >>>>> > > > >> >>>>> > Hi all, > > >> >>>>> > > > >> >>>>> > When upgrading from Beam 2.29.0 to 2.30.0, we encountered some > > >> unexpected runtime issues due to changes from BEAM-2303. This PR > updated > > >> AvroCoder to use SpecificDatum{Reader,Writer} instead > > >> ofReflectDatum{Reader,Writer} in its implementation. > > >> >>>>> > > > >> >>>>> > When using the Reflect* suite, Avro string fields have > > >> getters/setters defined with a CharSequence signature, but are by > default > > >> decoded as java.lang.Strings [1]. But the Specific* suitehas a > different > > >> default behavior for decoding Avro string fields: unless the Avro > schema > > >> property "java-class" is set to "java.lang.String", the decoded > > >> CharSequences will by default be implemented as > org.apache.avro.util.Utf8 > > >> objects [2]. > > >> >>>>> > > > >> >>>>> > This is causing some migration pain for us as we're having to > > >> either add the java-class property to all string field schemas, or > call > > >> .toString on a lot of fields we could just cast before. Additionally, > Utf8 > > >> isn't Serializable and there's no default Coder representation for it. > > >> Beam's AvroSink/AvroSource still use the Reflect* reader/writer, as > well.I > > >> created a quick Gist to demonstrate the issue: [3]. > > >> >>>>> > > > >> >>>>> > I'm wondering if there's any possibility of making the use of > > >> Reflect* vs Specific* configurable in AvroCoder, or maybe setting a > default > > >> String type in the coder constructor. If not, maybe this change > should be > > >> documented in the release notes? > > >> >>>>> > > > >> >>>>> > Thanks, > > >> >>>>> > Claire > > >> > > > > > >
