Thanks! I put up a branch with a possible solution for adding the Reflect
option to AvroCoder with as minimal a code change as possible [1]
<https://github.com/apache/beam/compare/master...clairemcginty:avro_reflect_coder_option?expand=1>
-
would love to get anyone's thoughts on this.

- Claire

On Wed, Jul 21, 2021 at 7:00 PM Ahmet Altay <[email protected]> wrote:

>
>
> On Wed, Jul 21, 2021 at 9:37 AM Claire McGinty <[email protected]>
> wrote:
>
>> Hi Ahmet! Yes, I think it should be documented in the release notes.
>>
>
> Great. +Vitaly, do you want to add the breaking change to the release
> notes, since this was related your change.
>
>
>> What do you think of Ryan’s suggestion to add a ReflectAvroCoder or a
>> configuration option to the existing AvroCoder?
>>
>
> I am not sure I am the best person to answer this. Second option, of
> adding a configuration to the existing AvroCoder, rather than creating a
> new coder makes more sense to me.
>
> That said, people who might have an opinion: /cc @Ismaël Mejía
> <[email protected]> @Kenneth Knowles <[email protected]> @Lukasz Cwik
> <[email protected]> +Vitaly
>
>
>>
>> Thanks,
>> Claire
>>
>> On Tue, Jul 20, 2021 at 4:15 PM Ahmet Altay <[email protected]> wrote:
>>
>>> Is this something we need to add to the 2.30.0 release notes (
>>> https://beam.apache.org/blog/beam-2.30.0/) as a breaking change?
>>>
>>> On Fri, Jul 16, 2021 at 7:11 AM Ryan Skraba <[email protected]> wrote:
>>>
>>>> Hello!  Good catch, I'm taking a look, but it looks like you're
>>>> entirely correct and there isn't any obvious workaround.  I guess you
>>>> could regenerate every SpecificRecord class in order to add the
>>>> "java-class" or "avro.java.string" annotation, but that shouldn't be
>>>> necessary.
>>>>
>>>> From the Avro perspective, we should always have been using
>>>> SpecificDatumReader/Writer for all generated SpecificRecords...  We
>>>> would still have the same Utf8 and .toString problems, but at least
>>>> there would be no change in behaviour during migration :/
>>>>
>>>> As a side note, the Apache Avro project should probably reconsider
>>>> whether the Utf8 class still adds any value with modern JVMs!  If I
>>>> understand correctly, it was originally in place because Hadoop had a
>>>> performance boost when it could reuse mutable data containers.
>>>>
>>>> Moving forward, I think your suggestion is the most pragmatic: either
>>>> add a configuration option to AvroCoder to always drop to ReflectData,
>>>> or explicitly provide a ReflectAvroCoder that only uses reflection.
>>>>
>>>> I took the liberty of creating the JIRA
>>>> https://issues.apache.org/jira/browse/BEAM-12628 JIRA, so I could
>>>> create an link an Avro issue!  Please feel free to update if I missed
>>>> anything.
>>>>
>>>> Best regards, Ryan
>>>>
>>>> On Thu, Jul 15, 2021 at 10:53 PM Claire McGinty
>>>> <[email protected]> wrote:
>>>> >
>>>> > Hi all,
>>>> >
>>>> > When upgrading from Beam 2.29.0 to 2.30.0, we encountered some
>>>> unexpected runtime issues due to changes from BEAM-2303. This PR updated
>>>> AvroCoder to use SpecificDatum{Reader,Writer} instead
>>>> ofReflectDatum{Reader,Writer} in its implementation.
>>>> >
>>>> > When using the Reflect* suite, Avro string fields have
>>>> getters/setters defined with a CharSequence signature, but are by default
>>>> decoded as java.lang.Strings [1]. But the Specific* suitehas a different
>>>> default behavior for decoding Avro string fields: unless the Avro schema
>>>> property "java-class" is set to "java.lang.String", the decoded
>>>> CharSequences will by default be implemented as org.apache.avro.util.Utf8
>>>> objects [2].
>>>> >
>>>> > This is causing some migration pain for us as we're having to either
>>>> add the java-class property to all string field schemas, or call .toString
>>>> on a lot of fields we could just cast before. Additionally, Utf8 isn't
>>>> Serializable and there's no default Coder representation for it. Beam's
>>>> AvroSink/AvroSource still use the Reflect* reader/writer, as well.I created
>>>> a quick Gist to demonstrate the issue: [3].
>>>> >
>>>> > I'm wondering if there's any possibility of making the use of
>>>> Reflect* vs Specific* configurable in AvroCoder, or maybe setting a default
>>>> String type in the coder constructor.  If not, maybe this change should be
>>>> documented in the release notes?
>>>> >
>>>> > Thanks,
>>>> > Claire
>>>>
>>>

Reply via email to