[ 
https://issues.apache.org/jira/browse/BEAM-2303?focusedWorklogId=623656&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-623656
 ]

ASF GitHub Bot logged work on BEAM-2303:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 16/Jul/21 15:00
            Start Date: 16/Jul/21 15:00
    Worklog Time Spent: 10m 
      Work Description: clairemcginty edited a comment on pull request #14410:
URL: https://github.com/apache/beam/pull/14410#issuecomment-880838488


   Hi @iemejia / @pabloem / @Amar3tto . this PR created some hidden bugs for us 
upgrading from Beam 2.29.0 to 2.30.0. It changes the default `CharSequence` 
representation in decoded Avro string fields. When using 
`ReflectDatum{Reader,Writer}`, `CharSequence`s are backed by default by Strings 
[[1]](https://github.com/apache/avro/blob/release-1.8.2/lang/java/avro/src/main/java/org/apache/avro/reflect/ReflectDatumReader.java#L229).
 This switch to `SpecificDatum{Reader,Writer}` means that, unless the Avro 
field property `java-class` is set to `java.lang.String` for all String fields, 
the `CharSequence`s are backed by default now by `org.apache.avro.util.Utf8`s 
[[2]](https://github.com/apache/avro/blob/release-1.8.2/lang/java/avro/src/main/java/org/apache/avro/generic/GenericDatumReader.java#L408).
 a lot of our users were relying on the default representation being Strings 
and are now seeing runtime errors in pipelines. Finally, `Utf8`s aren't 
serializable so there's no default `Coder` implementation for them, so users 
would have to convert them to Java strings anyway if they wanted to do a GBK 
operation on an Avro field, for example. I created a quick Gist to demonstrate 
the problem: 
[[3]](https://gist.github.com/clairemcginty/97ee6b33c0b5633d5d42d29b1d057d85). 
   
   Is this something I could bring to the ***@ or ****@ mailing list? Let me 
know what you think.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 623656)
    Time Spent: 2h  (was: 1h 50m)

> Add SpecificData to AvroCoder
> -----------------------------
>
>                 Key: BEAM-2303
>                 URL: https://issues.apache.org/jira/browse/BEAM-2303
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-core
>    Affects Versions: 2.1.0
>            Reporter: Arvid Heise
>            Assignee: Vitaly Terentyev
>            Priority: P3
>              Labels: Clarified
>             Fix For: 2.30.0
>
>          Time Spent: 2h
>  Remaining Estimate: 0h
>
> The AvroCoder currently supports GenericData and ReflectData, but not 
> SpecificData.
> It should relatively easy to incorporate it by expanding the logic while 
> constructing the Reader and Writer by also checking if the type implements 
> the SpecificRecord interface. It would greatly speed up (de-)serialization of 
> Avro-generated java classes.
> {code}
>             return myCoder.getType().equals(GenericRecord.class)
>                 ? new GenericDatumReader<T>(myCoder.getSchema())
>                 : new ReflectDatumReader<T>(
>                     myCoder.getSchema(), myCoder.getSchema(), 
> myCoder.reflectData.get());
> {code}
> should be
> {code}
>                         if (myCoder.getType().equals(GenericRecord.class)) {
>                             return new 
> GenericDatumReader<T>(myCoder.getSchema());
>                         }
>                         if 
> (SpecificRecord.class.isAssignableFrom(myCoder.getType())) {
>                             return new 
> SpecificDatumReader<T>(myCoder.getType());
>                         }
>                         return new ReflectDatumReader<T>(
>                                 myCoder.getSchema(), myCoder.getSchema(), 
> myCoder.reflectData.get());
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to