[jira] [Commented] (PARQUET-1656) Schema change results in exception - java.lang.ClassCastException

Xinli Shang (Jira) Fri, 20 Sep 2019 09:11:25 -0700


    [ 
https://issues.apache.org/jira/browse/PARQUET-1656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16934516#comment-16934516
 ]


Xinli Shang commented on PARQUET-1656:
--------------------------------------

I am working on the fix of this issue in 1.8.1(upgrading parquet is another 
story). Inside isElementType(), there is a condition check as below. 

    "elementSchema.getFields().size() == 1 && 
elementSchema.getFields().get(0).name().equals(...)"

Does anybody know if ".size() == 1" is a typo or by design? Or should be 
".size() >= 1"? 

 

Here are the complete code of this method.

private static boolean isElementType(Type repeatedType, Schema elementSchema) {
 if (repeatedType.isPrimitive() ||
 repeatedType.asGroupType().getFieldCount() > 1) {
 // The repeated type must be the element type because it is an invalid
 // synthetic wrapper (must be a group with one field).
 return true;
 } else if (elementSchema != null &&
 elementSchema.getType() == Schema.Type.RECORD &&
 elementSchema.getFields().size() == 1 &&
 elementSchema.getFields().get(0).name().equals(
 repeatedType.asGroupType().getFieldName(0))) {
 // The repeated type must be the element type because it matches the
 // structure of the Avro element's schema.
 return true;
 }
 return false;
 }

> Schema change  results in exception - java.lang.ClassCastException
> ------------------------------------------------------------------
>
>                 Key: PARQUET-1656
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1656
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-avro
>    Affects Versions: 1.8.1, 1.12.0
>         Environment: Hoodie/Parquet/Avro
> Parquet-1.8.1
> Avro-1.7.6
>            Reporter: Balajee Nagasubramaniam
>            Priority: Major
>              Labels: Parquet, avro
>
> Following exception was seen with parquet 1.8.1 (and in parquet 1.12.0, when 
> trying to reproduce it).
> Exception in thread "main" java.lang.ClassCastException: optional binary 
> phone_number (STRING) is not a group
> at 
> com.uber.komondor.shaded.org.apache.parquet.schema.Type.asGroupType(Type.java:250)
> at 
> com.uber.komondor.shaded.org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:279)
> at 
> com.uber.komondor.shaded.org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:232)
> at 
> com.uber.komondor.shaded.org.apache.parquet.avro.AvroRecordConverter.access$100(AvroRecordConverter.java:78)
> at 
> org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter$ElementConverter.<init>(AvroRecordConverter.java:536)
> at 
> org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter.<init>(AvroRecordConverter.java:486)
> at 
> org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:289)
> at 
> org.apache.parquet.avro.AvroRecordConverter.<init>(AvroRecordConverter.java:141)
> at 
> org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:279)
> at 
> org.apache.parquet.avro.AvroRecordConverter.<init>(AvroRecordConverter.java:141)
> at 
> org.apache.parquet.avro.AvroRecordConverter.<init>(AvroRecordConverter.java:95)
> at 
> org.apache.parquet.avro.AvroRecordMaterializer.<init>(AvroRecordMaterializer.java:33)
> at 
> org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:138)
> at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:183)
> at org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:156)
> at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135)
> at 
> util.ParquetToAvroSchemaConverter$.convert(ParquetToAvroSchemaConverter.scala:46)
> at 
> util.ParquetToAvroSchemaConverter$.main(ParquetToAvroSchemaConverter.scala:20)
> at util.ParquetToAvroSchemaConverter.main(ParquetToAvroSchemaConverter.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> Original exception was triggered by the following schema change.
> Schema Before change:
>                          {
>                             "default": null,
>                             "name": "master_cluster",
>                             "type": [
>                                 "null",
>                                 {
>                                     "fields": [
>                                         {
>                                             "name": "uuid",
>                                             "type": "string"
>                                         },
>                                         {
>                                             "name": "namespace",
>                                             "type": "string"
>                                         },
>                                         {
>                                             "name": "version",
>                                             "type": "long"
>                                         }
>                                     ],
>                                     "name": "master_cluster",
>                                     "type": "record"
>                                 }
>                             ]
>                         },
> After schema change:
>                         {
>                             "default": null,
>                             "name": "master_cluster",
>                             "type": [
>                                 "null",
>                                 {
>                                     "fields": [
>                                         {
>                                             "default": null,
>                                             "name": "uuid",
>                                             "type": [
>                                                 "null",
>                                                 "string"
>                                             ]
>                                         },
>                                         {
>                                             "default": null,
>                                             "name": "namespace",
>                                             "type": [
>                                                 "null",
>                                                 "string"
>                                             ]
>                                         },
>                                         {
>                                             "default": null,
>                                             "name": "version",
>                                             "type": [
>                                                 "null",
>                                                 "long"
>                                             ]
>                                         }
>                                     ],
>                                     "name": "VORGmaster_cluster",
>                                     "type": "record"
>                                 }
>                             ]
>                         },
> We were suspecting PARQUET-1441 could be in play and tried to reproduce the 
> issue on parquet-1.12.0 and seeing the same exception.
> During the repro noticed that issue could be with avroSchema conversion 
> (field name was substituted with generic name "array").  While we look into 
> this further, want to get community input on whether this is a known issue 
> and any thoughts on path forward.
> 19/09/12 22:34:37 DEBUG avro.SchemaCompatibility: Checking compatibility of 
> reader 
> {"type":"record","name":"IDphones_items","fields":[{"name":"phone_number","type":["null","string"],"default":null}]}
>  with writer 
> {"type":"record","name":"array","fields":[{"name":"phone_number","type":["null","string"],"default":null}]}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (PARQUET-1656) Schema change results in exception - java.lang.ClassCastException

Reply via email to