[ 
https://issues.apache.org/jira/browse/AVRO-3161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17756058#comment-17756058
 ] 

Etienne Hardy commented on AVRO-3161:
-------------------------------------

Hey [~clesaec] , the context for us is Kafka events where payloads are Avro 
encoded. I know about the Avro Maven plugin stringType, we already have set it 
as String. This parameter controls the static type of the generated Java 
object's properties, but in the case of Avro string arrays, the runtime type of 
the values deserialized and put in the list nevertheless ends up as Utf8 
objects, even if the static type of the array is java.util.List<String>.

Stepping into the deserialization code in GenericDatumReader, the 
[readString()|https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/generic/GenericDatumReader.java#L454-L463]
 method checks the avro.java.string property of the field's schema in order to 
determine if the array value should be read as a java.lang.String or as a 
java.lang.CharSequence. Without the avro.java.string property specified, the 
string values are read as Utf8, which implements java.lang.CharSequence. The 
ClassCastException happens for us when we then try to get a java.lang.String 
object from the object's list.

Hope this makes sense. I have a self-contained reproducible test case I can 
share if that helps.

> bad classcast, avro-maven-plugin not respecing configured stringType for 
> collections
> ------------------------------------------------------------------------------------
>
>                 Key: AVRO-3161
>                 URL: https://issues.apache.org/jira/browse/AVRO-3161
>             Project: Apache Avro
>          Issue Type: Bug
>    Affects Versions: 1.9.2
>            Reporter: Martin Mucha
>            Priority: Major
>
> Using avro schema defining element as:
> {code:java}
> {
>  "name": "field1",
>  "type": "string"
> },{code}
> and 
> {code:java}
> {
>  "name": "field2",
>  "type": ["null", {
>  "type": "array",
>  "name": "field2Array",
>  "items": {
>  "type": "string"
>  }
>  }],
>  "default": null
> }{code}
>  
> the avro-maven-plugin will generate put method, which will look like this:
>  
> {code:java}
> public void put(int field$, java.lang.Object value$) {
>  switch (field$) {
>  case 1: field1 = value$ != null ? value$.toString() : null; break;
>  case 19: field2 = (java.util.List<java.lang.String>)value$; break;
>  default: throw new org.apache.avro.AvroRuntimeException("Bad index");
>  }
>  }{code}
>  
> the problem is, that `value$.toString()` will correctly turn Utf8 to String, 
> while unchecked cast of List<Utf8> to List<String> will successfully trick 
> the compiller, but the items will still be of type Utf8.
> Plugin configuration:
> {code:java}
> <plugin>
>  <groupId>org.apache.avro</groupId>
>  <artifactId>avro-maven-plugin</artifactId>
>  <version>1.9.2</version>
>  <executions>
>  <execution>
>  <id>generateClassesFromTestSchemata</id>
>  <phase>generate-sources</phase>
>  <goals>
>  <goal>schema</goal>
>  </goals>
>  <configuration>
>  <stringType>String</stringType>
>  <fieldVisibility>PRIVATE</fieldVisibility>
>  <testSourceDirectory>...</testSourceDirectory>
>  <testOutputDirectory>...</testOutputDirectory>
>  </configuration>
>  </execution>
>  </executions>
> </plugin>{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to