[
https://issues.apache.org/jira/browse/HIVE-25120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
George Song updated HIVE-25120:
-------------------------------
Description:
In parquet 1.12.0 the modular encryption feature is introduced.
https://issues.apache.org/jira/browse/PARQUET-1178
VectorizedParquetRecordReader can't read parquet files with encrypted footer.
It throws the following exceptions.
{code:java}
Error: java.io.IOException: java.lang.reflect.InvocationTargetExceptionError:
java.io.IOException: java.lang.reflect.InvocationTargetException at
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
at
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
at
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:271)
at
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.<init>(HadoopShimsSecure.java:217)
at
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getRecordReader(HadoopShimsSecure.java:345)
at
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:719)
at
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:175)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:444) at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:349) at
org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:178) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:422) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1845)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:172)Caused by:
java.lang.reflect.InvocationTargetException at
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:257)
... 11 moreCaused by: java.lang.RuntimeException:
org.apache.parquet.crypto.ParquetCryptoRuntimeException: Trying to read file
with encrypted footer. No keys available at
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.<init>(VectorizedParquetRecordReader.java:156)
at
org.apache.hadoop.hive.ql.io.parquet.VectorizedParquetInputFormat.getRecordReader(VectorizedParquetInputFormat.java:50)
at
org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:87)
at
org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:99)
... 16 moreCaused by: org.apache.parquet.crypto.ParquetCryptoRuntimeException:
Trying to read file with encrypted footer. No keys available at
org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:588)
at
org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:527)
at
org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:521)
at
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.readFooterFromFile(VectorizedParquetRecordReader.java:345)
at
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.readSplitFooter(VectorizedParquetRecordReader.java:310)
at
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.initialize(VectorizedParquetRecordReader.java:222)
at
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.<init>(VectorizedParquetRecordReader.java:151)
... 19 more{code}
was:
Taking an example of a parquet table having array of integers as below.
{code:java}
CREATE EXTERNAL TABLE ( list_of_ints` array<int>)
STORED AS PARQUET
LOCATION '{location}';
{code}
Parquet file generated using hive will have schema for Type as below:
{code:java}
group list_of_ints (LIST) { repeated group bag { optional int32 array;\n};\n}
{code}
Parquet file generated using thrift or any custom tool (using
org.apache.parquet.io.api.RecordConsumer)
may have schema for Type as below:
{code:java}
required group list_of_ints (LIST) { repeated int32 list_of_tuple} {code}
VectorizedParquetRecordReader handles only parquet file generated using hive.
It throws the following exception when parquet file generated using thrift is
read because of the changes done as part of HIVE-18553 .
{code:java}
Caused by: java.lang.ClassCastException: repeated int32 list_of_ints_tuple is
not a group
at org.apache.parquet.schema.Type.asGroupType(Type.java:207)
at
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.getElementType(VectorizedParquetRecordReader.java:479)
at
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.buildVectorizedParquetReader(VectorizedParquetRecordReader.java:532)
at
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:440)
at
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:401)
at
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353)
at
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92)
at
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365){code}
I have done a small change to handle the case where the child type of group
type can be PrimitiveType.
> VectorizedParquetRecordReader can't to read parquet file with encrypted footer
> ------------------------------------------------------------------------------
>
> Key: HIVE-25120
> URL: https://issues.apache.org/jira/browse/HIVE-25120
> Project: Hive
> Issue Type: Bug
> Components: Parquet
> Affects Versions: 3.1.2, 4.0.0
> Reporter: George Song
> Assignee: Ganesha Shreedhara
> Priority: Major
>
> In parquet 1.12.0 the modular encryption feature is introduced.
> https://issues.apache.org/jira/browse/PARQUET-1178
> VectorizedParquetRecordReader can't read parquet files with encrypted footer.
> It throws the following exceptions.
> {code:java}
> Error: java.io.IOException: java.lang.reflect.InvocationTargetExceptionError:
> java.io.IOException: java.lang.reflect.InvocationTargetException at
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
> at
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
> at
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:271)
> at
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.<init>(HadoopShimsSecure.java:217)
> at
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getRecordReader(HadoopShimsSecure.java:345)
> at
> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:719)
> at
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:175)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:444) at
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:349) at
> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:178) at
> java.security.AccessController.doPrivileged(Native Method) at
> javax.security.auth.Subject.doAs(Subject.java:422) at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1845)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:172)Caused by:
> java.lang.reflect.InvocationTargetException at
> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:257)
> ... 11 moreCaused by: java.lang.RuntimeException:
> org.apache.parquet.crypto.ParquetCryptoRuntimeException: Trying to read file
> with encrypted footer. No keys available at
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.<init>(VectorizedParquetRecordReader.java:156)
> at
> org.apache.hadoop.hive.ql.io.parquet.VectorizedParquetInputFormat.getRecordReader(VectorizedParquetInputFormat.java:50)
> at
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:87)
> at
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:99)
> ... 16 moreCaused by:
> org.apache.parquet.crypto.ParquetCryptoRuntimeException: Trying to read file
> with encrypted footer. No keys available at
> org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:588)
> at
> org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:527)
> at
> org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:521)
> at
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.readFooterFromFile(VectorizedParquetRecordReader.java:345)
> at
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.readSplitFooter(VectorizedParquetRecordReader.java:310)
> at
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.initialize(VectorizedParquetRecordReader.java:222)
> at
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.<init>(VectorizedParquetRecordReader.java:151)
> ... 19 more{code}
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)