Hi Ahmet! Yes, I think it should be documented in the release notes. What
do you think of Ryan’s suggestion to add a ReflectAvroCoder or a
configuration option to the existing AvroCoder?

Thanks,
Claire

On Tue, Jul 20, 2021 at 4:15 PM Ahmet Altay <[email protected]> wrote:

> Is this something we need to add to the 2.30.0 release notes (
> https://beam.apache.org/blog/beam-2.30.0/) as a breaking change?
>
> On Fri, Jul 16, 2021 at 7:11 AM Ryan Skraba <[email protected]> wrote:
>
>> Hello!  Good catch, I'm taking a look, but it looks like you're
>> entirely correct and there isn't any obvious workaround.  I guess you
>> could regenerate every SpecificRecord class in order to add the
>> "java-class" or "avro.java.string" annotation, but that shouldn't be
>> necessary.
>>
>> From the Avro perspective, we should always have been using
>> SpecificDatumReader/Writer for all generated SpecificRecords...  We
>> would still have the same Utf8 and .toString problems, but at least
>> there would be no change in behaviour during migration :/
>>
>> As a side note, the Apache Avro project should probably reconsider
>> whether the Utf8 class still adds any value with modern JVMs!  If I
>> understand correctly, it was originally in place because Hadoop had a
>> performance boost when it could reuse mutable data containers.
>>
>> Moving forward, I think your suggestion is the most pragmatic: either
>> add a configuration option to AvroCoder to always drop to ReflectData,
>> or explicitly provide a ReflectAvroCoder that only uses reflection.
>>
>> I took the liberty of creating the JIRA
>> https://issues.apache.org/jira/browse/BEAM-12628 JIRA, so I could
>> create an link an Avro issue!  Please feel free to update if I missed
>> anything.
>>
>> Best regards, Ryan
>>
>> On Thu, Jul 15, 2021 at 10:53 PM Claire McGinty
>> <[email protected]> wrote:
>> >
>> > Hi all,
>> >
>> > When upgrading from Beam 2.29.0 to 2.30.0, we encountered some
>> unexpected runtime issues due to changes from BEAM-2303. This PR updated
>> AvroCoder to use SpecificDatum{Reader,Writer} instead
>> ofReflectDatum{Reader,Writer} in its implementation.
>> >
>> > When using the Reflect* suite, Avro string fields have getters/setters
>> defined with a CharSequence signature, but are by default decoded as
>> java.lang.Strings [1]. But the Specific* suitehas a different default
>> behavior for decoding Avro string fields: unless the Avro schema property
>> "java-class" is set to "java.lang.String", the decoded CharSequences will
>> by default be implemented as org.apache.avro.util.Utf8 objects [2].
>> >
>> > This is causing some migration pain for us as we're having to either
>> add the java-class property to all string field schemas, or call .toString
>> on a lot of fields we could just cast before. Additionally, Utf8 isn't
>> Serializable and there's no default Coder representation for it. Beam's
>> AvroSink/AvroSource still use the Reflect* reader/writer, as well.I created
>> a quick Gist to demonstrate the issue: [3].
>> >
>> > I'm wondering if there's any possibility of making the use of Reflect*
>> vs Specific* configurable in AvroCoder, or maybe setting a default String
>> type in the coder constructor.  If not, maybe this change should be
>> documented in the release notes?
>> >
>> > Thanks,
>> > Claire
>>
>

Reply via email to