Right; since ReflectData has just the one instance per classloader using it
in the API is purely stylistic to match the other signatures taking in
Class<T>/Schema :) open to any changes, the Boolean flag option is probably
clearer I can do whichever option is most in line with Beam style!

-Claire

On Tue, Jul 27, 2021 at 5:47 AM Ryan Skraba <[email protected]> wrote:

> Hello!  I took a quick look -- I think there's some potential
> confusion on this line[1] and the reflectData argument being passed
> into the new constructor.
>
> If I'm reading correctly, the argument passed in is never actually
> used in the eventual ReflectDatumReader/Writer, and it's a different
> type than the "this.reflectData" member in the instance.
>
> To restore the original behaviour, I'd probably recommend just passing
> in a boolean argument instead, something very explicit along the lines
> of "useReflectionOnSpecificData", or "alwaysUseAvroReflect".  That's
> also the reason to consider a very simple AvroReflectCoder.of(...)
> instead of an AvroCoder.of(x, y, true) factory method for readability,
> like what was done with AvroGenericCoder.
>
> It would be easier to comment on a PR, don't hesitate!
>
> All my best, Ryan
>
> [1]
> https://github.com/apache/beam/compare/master...clairemcginty:avro_reflect_coder_option?expand=1#diff-e875a9933286d97dd3d3d21a61e6f11c0e35624e97411c1b98f1ac672c21045dR311
>
>
> On Mon, Jul 26, 2021 at 6:42 PM Claire McGinty
> <[email protected]> wrote:
> >
> > Thanks! I put up a branch with a possible solution for adding the
> Reflect option to AvroCoder with as minimal a code change as possible [1] -
> would love to get anyone's thoughts on this.
> >
> > - Claire
> >
> > On Wed, Jul 21, 2021 at 7:00 PM Ahmet Altay <[email protected]> wrote:
> >>
> >>
> >>
> >> On Wed, Jul 21, 2021 at 9:37 AM Claire McGinty <
> [email protected]> wrote:
> >>>
> >>> Hi Ahmet! Yes, I think it should be documented in the release notes.
> >>
> >>
> >> Great. +Vitaly, do you want to add the breaking change to the release
> notes, since this was related your change.
> >>
> >>>
> >>> What do you think of Ryan’s suggestion to add a ReflectAvroCoder or a
> configuration option to the existing AvroCoder?
> >>
> >>
> >> I am not sure I am the best person to answer this. Second option, of
> adding a configuration to the existing AvroCoder, rather than creating a
> new coder makes more sense to me.
> >>
> >> That said, people who might have an opinion: /cc @Ismaël Mejía @Kenneth
> Knowles @Lukasz Cwik +Vitaly
> >>
> >>>
> >>>
> >>> Thanks,
> >>> Claire
> >>>
> >>> On Tue, Jul 20, 2021 at 4:15 PM Ahmet Altay <[email protected]> wrote:
> >>>>
> >>>> Is this something we need to add to the 2.30.0 release notes (
> https://beam.apache.org/blog/beam-2.30.0/) as a breaking change?
> >>>>
> >>>> On Fri, Jul 16, 2021 at 7:11 AM Ryan Skraba <[email protected]> wrote:
> >>>>>
> >>>>> Hello!  Good catch, I'm taking a look, but it looks like you're
> >>>>> entirely correct and there isn't any obvious workaround.  I guess you
> >>>>> could regenerate every SpecificRecord class in order to add the
> >>>>> "java-class" or "avro.java.string" annotation, but that shouldn't be
> >>>>> necessary.
> >>>>>
> >>>>> From the Avro perspective, we should always have been using
> >>>>> SpecificDatumReader/Writer for all generated SpecificRecords...  We
> >>>>> would still have the same Utf8 and .toString problems, but at least
> >>>>> there would be no change in behaviour during migration :/
> >>>>>
> >>>>> As a side note, the Apache Avro project should probably reconsider
> >>>>> whether the Utf8 class still adds any value with modern JVMs!  If I
> >>>>> understand correctly, it was originally in place because Hadoop had a
> >>>>> performance boost when it could reuse mutable data containers.
> >>>>>
> >>>>> Moving forward, I think your suggestion is the most pragmatic: either
> >>>>> add a configuration option to AvroCoder to always drop to
> ReflectData,
> >>>>> or explicitly provide a ReflectAvroCoder that only uses reflection.
> >>>>>
> >>>>> I took the liberty of creating the JIRA
> >>>>> https://issues.apache.org/jira/browse/BEAM-12628 JIRA, so I could
> >>>>> create an link an Avro issue!  Please feel free to update if I missed
> >>>>> anything.
> >>>>>
> >>>>> Best regards, Ryan
> >>>>>
> >>>>> On Thu, Jul 15, 2021 at 10:53 PM Claire McGinty
> >>>>> <[email protected]> wrote:
> >>>>> >
> >>>>> > Hi all,
> >>>>> >
> >>>>> > When upgrading from Beam 2.29.0 to 2.30.0, we encountered some
> unexpected runtime issues due to changes from BEAM-2303. This PR updated
> AvroCoder to use SpecificDatum{Reader,Writer} instead
> ofReflectDatum{Reader,Writer} in its implementation.
> >>>>> >
> >>>>> > When using the Reflect* suite, Avro string fields have
> getters/setters defined with a CharSequence signature, but are by default
> decoded as java.lang.Strings [1]. But the Specific* suitehas a different
> default behavior for decoding Avro string fields: unless the Avro schema
> property "java-class" is set to "java.lang.String", the decoded
> CharSequences will by default be implemented as org.apache.avro.util.Utf8
> objects [2].
> >>>>> >
> >>>>> > This is causing some migration pain for us as we're having to
> either add the java-class property to all string field schemas, or call
> .toString on a lot of fields we could just cast before. Additionally, Utf8
> isn't Serializable and there's no default Coder representation for it.
> Beam's AvroSink/AvroSource still use the Reflect* reader/writer, as well.I
> created a quick Gist to demonstrate the issue: [3].
> >>>>> >
> >>>>> > I'm wondering if there's any possibility of making the use of
> Reflect* vs Specific* configurable in AvroCoder, or maybe setting a default
> String type in the coder constructor.  If not, maybe this change should be
> documented in the release notes?
> >>>>> >
> >>>>> > Thanks,
> >>>>> > Claire
>

Reply via email to