ArnavBalyan opened a new pull request, #3287:
URL: https://github.com/apache/parquet-java/pull/3287
### Summary
- Currently parquet-cli breaks while operating on parquet files generated
through parquet protobuf
- This is because the CLI currently uses AvroReadSupport and
AvroRecrodConverter which breaks for protobuf since underlying schema/data is
different.
- We now support reading proto files through CLI reader which routes the
request to simple group factory for protobuf parquet files.
#### - Before:
```
Time elapsed: 1.351 s <<< ERROR!
java.lang.RuntimeException: Failed on record 0 in file
/tmp/junit149783857212573183/proto_someevent.parquet
at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:89)
at
org.apache.parquet.cli.commands.CatCommandTest.testCatCommandProtoParquetAutoDetected(CatCommandTest.java:82)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read
value at 0 in block -1 in file
file:/tmp/junit149783857212573183/proto_someevent.parquet
at
org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:280)
at
org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:136)
at
org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:140)
at
org.apache.parquet.cli.BaseCommand$2$1.advance(BaseCommand.java:407)
at
org.apache.parquet.cli.BaseCommand$2$1.<init>(BaseCommand.java:388)
at
org.apache.parquet.cli.BaseCommand$2.iterator(BaseCommand.java:386)
at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:76)
at
org.apache.parquet.cli.commands.CatCommandTest.testCatCommandProtoParquetAutoDetected(CatCommandTest.java:82)
[INFO]
[INFO] Results:
[INFO]
[ERROR] Errors:
[ERROR] CatCommandTest.testCatCommandProtoParquetAutoDetected:82 ยป Runtime
Failed on record 0 in file /tmp/junit149783857212573183/proto_someevent.parquet
```
#### - After:
```
[INFO] Running org.apache.parquet.cli.commands...
repeatedInt: 1
repeatedInt: 2
repeatedInt: 3
```
(Succesful read)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]