hi, I wanted to ping this thread again! It would be really helpful to know whether this is something that can be eventually fixed in Beam, or whether we'll have to make the changes on our end.
- Claire On Tue, Jul 27, 2021 at 8:06 AM Claire McGinty <[email protected]> wrote: > Right; since ReflectData has just the one instance per classloader using > it in the API is purely stylistic to match the other signatures taking in > Class<T>/Schema :) open to any changes, the Boolean flag option is probably > clearer I can do whichever option is most in line with Beam style! > > -Claire > > On Tue, Jul 27, 2021 at 5:47 AM Ryan Skraba <[email protected]> wrote: > >> Hello! I took a quick look -- I think there's some potential >> confusion on this line[1] and the reflectData argument being passed >> into the new constructor. >> >> If I'm reading correctly, the argument passed in is never actually >> used in the eventual ReflectDatumReader/Writer, and it's a different >> type than the "this.reflectData" member in the instance. >> >> To restore the original behaviour, I'd probably recommend just passing >> in a boolean argument instead, something very explicit along the lines >> of "useReflectionOnSpecificData", or "alwaysUseAvroReflect". That's >> also the reason to consider a very simple AvroReflectCoder.of(...) >> instead of an AvroCoder.of(x, y, true) factory method for readability, >> like what was done with AvroGenericCoder. >> >> It would be easier to comment on a PR, don't hesitate! >> >> All my best, Ryan >> >> [1] >> https://github.com/apache/beam/compare/master...clairemcginty:avro_reflect_coder_option?expand=1#diff-e875a9933286d97dd3d3d21a61e6f11c0e35624e97411c1b98f1ac672c21045dR311 >> >> >> On Mon, Jul 26, 2021 at 6:42 PM Claire McGinty >> <[email protected]> wrote: >> > >> > Thanks! I put up a branch with a possible solution for adding the >> Reflect option to AvroCoder with as minimal a code change as possible [1] - >> would love to get anyone's thoughts on this. >> > >> > - Claire >> > >> > On Wed, Jul 21, 2021 at 7:00 PM Ahmet Altay <[email protected]> wrote: >> >> >> >> >> >> >> >> On Wed, Jul 21, 2021 at 9:37 AM Claire McGinty < >> [email protected]> wrote: >> >>> >> >>> Hi Ahmet! Yes, I think it should be documented in the release notes. >> >> >> >> >> >> Great. +Vitaly, do you want to add the breaking change to the release >> notes, since this was related your change. >> >> >> >>> >> >>> What do you think of Ryan’s suggestion to add a ReflectAvroCoder or a >> configuration option to the existing AvroCoder? >> >> >> >> >> >> I am not sure I am the best person to answer this. Second option, of >> adding a configuration to the existing AvroCoder, rather than creating a >> new coder makes more sense to me. >> >> >> >> That said, people who might have an opinion: /cc @Ismaël Mejía >> @Kenneth Knowles @Lukasz Cwik +Vitaly >> >> >> >>> >> >>> >> >>> Thanks, >> >>> Claire >> >>> >> >>> On Tue, Jul 20, 2021 at 4:15 PM Ahmet Altay <[email protected]> wrote: >> >>>> >> >>>> Is this something we need to add to the 2.30.0 release notes ( >> https://beam.apache.org/blog/beam-2.30.0/) as a breaking change? >> >>>> >> >>>> On Fri, Jul 16, 2021 at 7:11 AM Ryan Skraba <[email protected]> wrote: >> >>>>> >> >>>>> Hello! Good catch, I'm taking a look, but it looks like you're >> >>>>> entirely correct and there isn't any obvious workaround. I guess >> you >> >>>>> could regenerate every SpecificRecord class in order to add the >> >>>>> "java-class" or "avro.java.string" annotation, but that shouldn't be >> >>>>> necessary. >> >>>>> >> >>>>> From the Avro perspective, we should always have been using >> >>>>> SpecificDatumReader/Writer for all generated SpecificRecords... We >> >>>>> would still have the same Utf8 and .toString problems, but at least >> >>>>> there would be no change in behaviour during migration :/ >> >>>>> >> >>>>> As a side note, the Apache Avro project should probably reconsider >> >>>>> whether the Utf8 class still adds any value with modern JVMs! If I >> >>>>> understand correctly, it was originally in place because Hadoop had >> a >> >>>>> performance boost when it could reuse mutable data containers. >> >>>>> >> >>>>> Moving forward, I think your suggestion is the most pragmatic: >> either >> >>>>> add a configuration option to AvroCoder to always drop to >> ReflectData, >> >>>>> or explicitly provide a ReflectAvroCoder that only uses reflection. >> >>>>> >> >>>>> I took the liberty of creating the JIRA >> >>>>> https://issues.apache.org/jira/browse/BEAM-12628 JIRA, so I could >> >>>>> create an link an Avro issue! Please feel free to update if I >> missed >> >>>>> anything. >> >>>>> >> >>>>> Best regards, Ryan >> >>>>> >> >>>>> On Thu, Jul 15, 2021 at 10:53 PM Claire McGinty >> >>>>> <[email protected]> wrote: >> >>>>> > >> >>>>> > Hi all, >> >>>>> > >> >>>>> > When upgrading from Beam 2.29.0 to 2.30.0, we encountered some >> unexpected runtime issues due to changes from BEAM-2303. This PR updated >> AvroCoder to use SpecificDatum{Reader,Writer} instead >> ofReflectDatum{Reader,Writer} in its implementation. >> >>>>> > >> >>>>> > When using the Reflect* suite, Avro string fields have >> getters/setters defined with a CharSequence signature, but are by default >> decoded as java.lang.Strings [1]. But the Specific* suitehas a different >> default behavior for decoding Avro string fields: unless the Avro schema >> property "java-class" is set to "java.lang.String", the decoded >> CharSequences will by default be implemented as org.apache.avro.util.Utf8 >> objects [2]. >> >>>>> > >> >>>>> > This is causing some migration pain for us as we're having to >> either add the java-class property to all string field schemas, or call >> .toString on a lot of fields we could just cast before. Additionally, Utf8 >> isn't Serializable and there's no default Coder representation for it. >> Beam's AvroSink/AvroSource still use the Reflect* reader/writer, as well.I >> created a quick Gist to demonstrate the issue: [3]. >> >>>>> > >> >>>>> > I'm wondering if there's any possibility of making the use of >> Reflect* vs Specific* configurable in AvroCoder, or maybe setting a default >> String type in the coder constructor. If not, maybe this change should be >> documented in the release notes? >> >>>>> > >> >>>>> > Thanks, >> >>>>> > Claire >> >
