Hi Ahmet! Yes, I think it should be documented in the release notes. What do you think of Ryan’s suggestion to add a ReflectAvroCoder or a configuration option to the existing AvroCoder?
Thanks, Claire On Tue, Jul 20, 2021 at 4:15 PM Ahmet Altay <[email protected]> wrote: > Is this something we need to add to the 2.30.0 release notes ( > https://beam.apache.org/blog/beam-2.30.0/) as a breaking change? > > On Fri, Jul 16, 2021 at 7:11 AM Ryan Skraba <[email protected]> wrote: > >> Hello! Good catch, I'm taking a look, but it looks like you're >> entirely correct and there isn't any obvious workaround. I guess you >> could regenerate every SpecificRecord class in order to add the >> "java-class" or "avro.java.string" annotation, but that shouldn't be >> necessary. >> >> From the Avro perspective, we should always have been using >> SpecificDatumReader/Writer for all generated SpecificRecords... We >> would still have the same Utf8 and .toString problems, but at least >> there would be no change in behaviour during migration :/ >> >> As a side note, the Apache Avro project should probably reconsider >> whether the Utf8 class still adds any value with modern JVMs! If I >> understand correctly, it was originally in place because Hadoop had a >> performance boost when it could reuse mutable data containers. >> >> Moving forward, I think your suggestion is the most pragmatic: either >> add a configuration option to AvroCoder to always drop to ReflectData, >> or explicitly provide a ReflectAvroCoder that only uses reflection. >> >> I took the liberty of creating the JIRA >> https://issues.apache.org/jira/browse/BEAM-12628 JIRA, so I could >> create an link an Avro issue! Please feel free to update if I missed >> anything. >> >> Best regards, Ryan >> >> On Thu, Jul 15, 2021 at 10:53 PM Claire McGinty >> <[email protected]> wrote: >> > >> > Hi all, >> > >> > When upgrading from Beam 2.29.0 to 2.30.0, we encountered some >> unexpected runtime issues due to changes from BEAM-2303. This PR updated >> AvroCoder to use SpecificDatum{Reader,Writer} instead >> ofReflectDatum{Reader,Writer} in its implementation. >> > >> > When using the Reflect* suite, Avro string fields have getters/setters >> defined with a CharSequence signature, but are by default decoded as >> java.lang.Strings [1]. But the Specific* suitehas a different default >> behavior for decoding Avro string fields: unless the Avro schema property >> "java-class" is set to "java.lang.String", the decoded CharSequences will >> by default be implemented as org.apache.avro.util.Utf8 objects [2]. >> > >> > This is causing some migration pain for us as we're having to either >> add the java-class property to all string field schemas, or call .toString >> on a lot of fields we could just cast before. Additionally, Utf8 isn't >> Serializable and there's no default Coder representation for it. Beam's >> AvroSink/AvroSource still use the Reflect* reader/writer, as well.I created >> a quick Gist to demonstrate the issue: [3]. >> > >> > I'm wondering if there's any possibility of making the use of Reflect* >> vs Specific* configurable in AvroCoder, or maybe setting a default String >> type in the coder constructor. If not, maybe this change should be >> documented in the release notes? >> > >> > Thanks, >> > Claire >> >
