[
https://issues.apache.org/jira/browse/AVRO-3161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17756058#comment-17756058
]
Etienne Hardy commented on AVRO-3161:
-------------------------------------
Hey [~clesaec] , the context for us is Kafka events where payloads are Avro
encoded. I know about the Avro Maven plugin stringType, we already have set it
as String. This parameter controls the static type of the generated Java
object's properties, but in the case of Avro string arrays, the runtime type of
the values deserialized and put in the list nevertheless ends up as Utf8
objects, even if the static type of the array is java.util.List<String>.
Stepping into the deserialization code in GenericDatumReader, the
[readString()|https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/generic/GenericDatumReader.java#L454-L463]
method checks the avro.java.string property of the field's schema in order to
determine if the array value should be read as a java.lang.String or as a
java.lang.CharSequence. Without the avro.java.string property specified, the
string values are read as Utf8, which implements java.lang.CharSequence. The
ClassCastException happens for us when we then try to get a java.lang.String
object from the object's list.
Hope this makes sense. I have a self-contained reproducible test case I can
share if that helps.
> bad classcast, avro-maven-plugin not respecing configured stringType for
> collections
> ------------------------------------------------------------------------------------
>
> Key: AVRO-3161
> URL: https://issues.apache.org/jira/browse/AVRO-3161
> Project: Apache Avro
> Issue Type: Bug
> Affects Versions: 1.9.2
> Reporter: Martin Mucha
> Priority: Major
>
> Using avro schema defining element as:
> {code:java}
> {
> "name": "field1",
> "type": "string"
> },{code}
> and
> {code:java}
> {
> "name": "field2",
> "type": ["null", {
> "type": "array",
> "name": "field2Array",
> "items": {
> "type": "string"
> }
> }],
> "default": null
> }{code}
>
> the avro-maven-plugin will generate put method, which will look like this:
>
> {code:java}
> public void put(int field$, java.lang.Object value$) {
> switch (field$) {
> case 1: field1 = value$ != null ? value$.toString() : null; break;
> case 19: field2 = (java.util.List<java.lang.String>)value$; break;
> default: throw new org.apache.avro.AvroRuntimeException("Bad index");
> }
> }{code}
>
> the problem is, that `value$.toString()` will correctly turn Utf8 to String,
> while unchecked cast of List<Utf8> to List<String> will successfully trick
> the compiller, but the items will still be of type Utf8.
> Plugin configuration:
> {code:java}
> <plugin>
> <groupId>org.apache.avro</groupId>
> <artifactId>avro-maven-plugin</artifactId>
> <version>1.9.2</version>
> <executions>
> <execution>
> <id>generateClassesFromTestSchemata</id>
> <phase>generate-sources</phase>
> <goals>
> <goal>schema</goal>
> </goals>
> <configuration>
> <stringType>String</stringType>
> <fieldVisibility>PRIVATE</fieldVisibility>
> <testSourceDirectory>...</testSourceDirectory>
> <testOutputDirectory>...</testOutputDirectory>
> </configuration>
> </execution>
> </executions>
> </plugin>{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)