[ 
https://issues.apache.org/jira/browse/PARQUET-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17598508#comment-17598508
 ] 

J Y commented on PARQUET-2181:
------------------------------

the more i think about this, the more i believe using avro reading as the basis 
for parquet reading is broken.  for example, {{Argument error: Map key type 
must be binary (UTF8): required int32 key}} is due to avro requiring all map 
keys to be strings.  parquet and protos do not have this limitation.  avro as 
the schema definition doesn't seem expressive enough to easily interoperate 
with these other formats.

> parquet-cli fails at supporting parquet-protobuf generated files
> ----------------------------------------------------------------
>
>                 Key: PARQUET-2181
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2181
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-cli
>            Reporter: J Y
>            Priority: Critical
>         Attachments: samples.tgz
>
>
> i generated a parquet file using a protobuf with this proto definition:
> {code:java}
> message IndexPath {
>   // Index of item in path.
>   repeated int32 index = 1;
> }
> message SomeEvent {
>   // truncated/obfuscated wrapper
>   optional IndexPath client_position = 1;
> }
> {code}
> this gets translated to the following parquet schema using the new compliant 
> schema for lists:
> {code:java}
> message SomeEvent {
>   optional group client_position = 1 {
>     optional group index (LIST) = 1 {
>       repeated group list {
>         required int32 element;
>       }
>     }
>   }
> }
> {code}
> this causes parquet-cli cat to barf on a file containing these events:
> {quote}java.lang.RuntimeException: Failed on record 0
>         at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:86)
>         at org.apache.parquet.cli.Main.run(Main.java:157)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>         at org.apache.parquet.cli.Main.main(Main.java:187)
> Caused by: java.lang.ClassCastException: required int32 element is not a group
>         at org.apache.parquet.schema.Type.asGroupType(Type.java:248)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:284)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:228)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.access$100(AvroRecordConverter.java:74)
>         at 
> org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter$ElementConverter.<init>(AvroRecordConverter.java:539)
>         at 
> org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter.<init>(AvroRecordConverter.java:489)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:293)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.<init>(AvroRecordConverter.java:137)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:284)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.<init>(AvroRecordConverter.java:137)
>         at 
> org.apache.parquet.avro.AvroRecordConverter.<init>(AvroRecordConverter.java:91)
>         at 
> org.apache.parquet.avro.AvroRecordMaterializer.<init>(AvroRecordMaterializer.java:33)
>         at 
> org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:142)
>         at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:190)
>         at 
> org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:166)
>         at 
> org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135)
>         at 
> org.apache.parquet.cli.BaseCommand$1$1.advance(BaseCommand.java:363)
>         at org.apache.parquet.cli.BaseCommand$1$1.<init>(BaseCommand.java:344)
>         at org.apache.parquet.cli.BaseCommand$1.iterator(BaseCommand.java:342)
>         at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:73)
>         ... 3 more
> {quote}
> using the old parquet-tools binary to cat this file works fine.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to