Thanks! I put up a branch with a possible solution for adding the Reflect option to AvroCoder with as minimal a code change as possible [1] <https://github.com/apache/beam/compare/master...clairemcginty:avro_reflect_coder_option?expand=1> - would love to get anyone's thoughts on this.
- Claire On Wed, Jul 21, 2021 at 7:00 PM Ahmet Altay <[email protected]> wrote: > > > On Wed, Jul 21, 2021 at 9:37 AM Claire McGinty <[email protected]> > wrote: > >> Hi Ahmet! Yes, I think it should be documented in the release notes. >> > > Great. +Vitaly, do you want to add the breaking change to the release > notes, since this was related your change. > > >> What do you think of Ryan’s suggestion to add a ReflectAvroCoder or a >> configuration option to the existing AvroCoder? >> > > I am not sure I am the best person to answer this. Second option, of > adding a configuration to the existing AvroCoder, rather than creating a > new coder makes more sense to me. > > That said, people who might have an opinion: /cc @Ismaël Mejía > <[email protected]> @Kenneth Knowles <[email protected]> @Lukasz Cwik > <[email protected]> +Vitaly > > >> >> Thanks, >> Claire >> >> On Tue, Jul 20, 2021 at 4:15 PM Ahmet Altay <[email protected]> wrote: >> >>> Is this something we need to add to the 2.30.0 release notes ( >>> https://beam.apache.org/blog/beam-2.30.0/) as a breaking change? >>> >>> On Fri, Jul 16, 2021 at 7:11 AM Ryan Skraba <[email protected]> wrote: >>> >>>> Hello! Good catch, I'm taking a look, but it looks like you're >>>> entirely correct and there isn't any obvious workaround. I guess you >>>> could regenerate every SpecificRecord class in order to add the >>>> "java-class" or "avro.java.string" annotation, but that shouldn't be >>>> necessary. >>>> >>>> From the Avro perspective, we should always have been using >>>> SpecificDatumReader/Writer for all generated SpecificRecords... We >>>> would still have the same Utf8 and .toString problems, but at least >>>> there would be no change in behaviour during migration :/ >>>> >>>> As a side note, the Apache Avro project should probably reconsider >>>> whether the Utf8 class still adds any value with modern JVMs! If I >>>> understand correctly, it was originally in place because Hadoop had a >>>> performance boost when it could reuse mutable data containers. >>>> >>>> Moving forward, I think your suggestion is the most pragmatic: either >>>> add a configuration option to AvroCoder to always drop to ReflectData, >>>> or explicitly provide a ReflectAvroCoder that only uses reflection. >>>> >>>> I took the liberty of creating the JIRA >>>> https://issues.apache.org/jira/browse/BEAM-12628 JIRA, so I could >>>> create an link an Avro issue! Please feel free to update if I missed >>>> anything. >>>> >>>> Best regards, Ryan >>>> >>>> On Thu, Jul 15, 2021 at 10:53 PM Claire McGinty >>>> <[email protected]> wrote: >>>> > >>>> > Hi all, >>>> > >>>> > When upgrading from Beam 2.29.0 to 2.30.0, we encountered some >>>> unexpected runtime issues due to changes from BEAM-2303. This PR updated >>>> AvroCoder to use SpecificDatum{Reader,Writer} instead >>>> ofReflectDatum{Reader,Writer} in its implementation. >>>> > >>>> > When using the Reflect* suite, Avro string fields have >>>> getters/setters defined with a CharSequence signature, but are by default >>>> decoded as java.lang.Strings [1]. But the Specific* suitehas a different >>>> default behavior for decoding Avro string fields: unless the Avro schema >>>> property "java-class" is set to "java.lang.String", the decoded >>>> CharSequences will by default be implemented as org.apache.avro.util.Utf8 >>>> objects [2]. >>>> > >>>> > This is causing some migration pain for us as we're having to either >>>> add the java-class property to all string field schemas, or call .toString >>>> on a lot of fields we could just cast before. Additionally, Utf8 isn't >>>> Serializable and there's no default Coder representation for it. Beam's >>>> AvroSink/AvroSource still use the Reflect* reader/writer, as well.I created >>>> a quick Gist to demonstrate the issue: [3]. >>>> > >>>> > I'm wondering if there's any possibility of making the use of >>>> Reflect* vs Specific* configurable in AvroCoder, or maybe setting a default >>>> String type in the coder constructor. If not, maybe this change should be >>>> documented in the release notes? >>>> > >>>> > Thanks, >>>> > Claire >>>> >>>
