[ 
https://issues.apache.org/jira/browse/BEAM-12628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17415471#comment-17415471
 ] 

Ryan Skraba commented on BEAM-12628:
------------------------------------

 [~clairemcginty] Now that your fix is in place, can you let us know if it's 
tricky?  Since you're actually in production, I'd defer to your judgement!  I 
think we're OK because the fix makes this configurable – I'm tempted to say 
that it's a breaking change and we should have caught it sooner, and definitely 
released it ASAP... but using SpecificData over ReflectData is really the right 
and expected thing to do for future users.

In the mailing list, I think this was a mistake on my part:
{quote}As a side note, the Apache Avro project should probably reconsider 
whether the Utf8 class still adds any value with modern JVMs! If I understand 
correctly, it was originally in place because Hadoop had a performance boost 
when it could reuse mutable data containers.
{quote}
As it turns out, it is still relevant because the conversion bytes -> char is 
still expensive enough that it's worthwhile short-circuiting when we can.

For point (2) in the description, there's AVRO-200 – I think we should re-raise 
this issue for a future release of Avro, because extracting a single primitive 
key from an Avro record is a very common use case!

 

> AvroCoder changed underlying String class for SpecificRecords
> -------------------------------------------------------------
>
>                 Key: BEAM-12628
>                 URL: https://issues.apache.org/jira/browse/BEAM-12628
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-avro
>    Affects Versions: 2.30.0
>            Reporter: Ryan Skraba
>            Assignee: Claire McGinty
>            Priority: P1
>             Fix For: 2.33.0
>
>          Time Spent: 3h
>  Remaining Estimate: 0h
>
> The AvroCoder changes for BEAM-2303 changes the reader/writer from the Avro 
> {{ReflectDatum*}} classes to the {{SpecificDatum*}} classes.
> Because of the way Avro handles Strings, however, the underlying instances 
> for String data are deserialised as {{org.apache.avro.util.Utf8}} instances 
> instead of {{java.lang.String}}.
> This causes:
> 1. an unexpected behaviour change when migrating to Beam 2.30.0
> 2. potential serialization issues when using these String instances (Utf8 
> instances don't implement Serializable)
> 3. an inconsistent API between {{AvroCoder}} and {{AvroSink}}/{{AvroSource}} 
> (the latter still use {{ReflectDatum*}})
> (Original report on the [mailing 
> list|https://lists.apache.org/x/thread.html/r5d0b975926cc4761f025ecd8df58a31e3f99e522296cc47d82ed5943@%3Cdev.beam.apache.org%3E]
>  and [PR|https://github.com/apache/beam/pull/14410#issuecomment-880838488])



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to