clairemcginty commented on pull request #14410:
URL: https://github.com/apache/beam/pull/14410#issuecomment-880838488


   Hi @iemejia / @pabloem / @Amar3tto . this PR created some hidden bugs for us 
upgrading from Beam 2.29.0 to 2.30.0. It changes the default `CharSequence` 
representation in decoded Avro string fields. When using 
`ReflectDatum{Reader,Writer}`, `CharSequence`s are backed by default by Strings 
[[1]](https://github.com/apache/avro/blob/release-1.8.2/lang/java/avro/src/main/java/org/apache/avro/reflect/ReflectDatumReader.java#L229).
 This switch to `SpecificDatum{Reader,Writer}` means that, unless the Avro 
field property `java-class` is set to `java.lang.String` for all String fields, 
the `CharSequence`s are backed by default now by `org.apache.avro.util.Utf8`s 
[[2]](https://github.com/apache/avro/blob/release-1.8.2/lang/java/avro/src/main/java/org/apache/avro/generic/GenericDatumReader.java#L408).
 a lot of our users were relying on the default representation being Strings 
and are now seeing runtime errors in pipelines. Finally, `Utf8`s aren't 
serializable so there's no default `Coder` im
 plementation for them, so users would have to convert them to Java strings 
anyway if they wanted to do a GBK operation on an Avro field, for example. I 
created a quick Gist to demonstrate the problem: 
[[3]](https://gist.github.com/clairemcginty/97ee6b33c0b5633d5d42d29b1d057d85). 
   
   Is this something I could bring to the dev@ or user@ mailing list? Let me 
know what you think.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to