Hi all, When upgrading from Beam 2.29.0 to 2.30.0, we encountered some unexpected runtime issues due to changes from BEAM-2303 <https://github.com/apache/beam/pull/14410>. This PR updated AvroCoder to use SpecificDatum{Reader,Writer} instead ofReflectDatum{Reader,Writer} in its implementation.
When using the Reflect* suite, Avro string fields have getters/setters defined with a CharSequence signature, but are by default decoded as java.lang.Strings [1] <https://github.com/apache/avro/blob/release-1.8.2/lang/java/avro/src/main/java/org/apache/avro/reflect/ReflectDatumReader.java#L229>. But the Specific* suitehas a different default behavior for decoding Avro string fields: unless the Avro schema property "java-class" is set to "java.lang.String", the decoded CharSequences will by default be implemented as org.apache.avro.util.Utf8 objects [2] <https://github.com/apache/avro/blob/release-1.8.2/lang/java/avro/src/main/java/org/apache/avro/generic/GenericDatumReader.java#L408> . This is causing some migration pain for us as we're having to either add the java-class property to all string field schemas, or call .toString on a lot of fields we could just cast before. Additionally, Utf8 isn't Serializable and there's no default Coder representation for it. Beam's AvroSink/AvroSource still use the Reflect* reader/writer, as well.I created a quick Gist to demonstrate the issue: [3] <https://gist.github.com/clairemcginty/97ee6b33c0b5633d5d42d29b1d057d85>. I'm wondering if there's any possibility of making the use of Reflect* vs Specific* configurable in AvroCoder, or maybe setting a default String type in the coder constructor. If not, maybe this change should be documented in the release notes? Thanks, Claire
