Ganesha Shreedhara created HIVE-22670:
-----------------------------------------

             Summary: ArrayIndexOutOfBoundsException when vectorized reader is 
used for reading a parquet file
                 Key: HIVE-22670
                 URL: https://issues.apache.org/jira/browse/HIVE-22670
             Project: Hive
          Issue Type: Bug
    Affects Versions: 2.3.6, 3.1.2
            Reporter: Ganesha Shreedhara
            Assignee: Ganesha Shreedhara


ArrayIndexOutOfBoundsException is getting thrown while decoding dictionaryIds 
of a row group in parquet file with vectorization enabled. 

*Exception stack trace:*
{code:java}
Caused by: java.lang.ArrayIndexOutOfBoundsException: 0
 at 
org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainBinaryDictionary.decodeToBinary(PlainValuesDictionary.java:122)
 at 
org.apache.hadoop.hive.ql.io.parquet.vector.ParquetDataColumnReaderFactory$DefaultParquetDataColumnReader.readString(ParquetDataColumnReaderFactory.java:95)
 at 
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedPrimitiveColumnReader.decodeDictionaryIds(VectorizedPrimitiveColumnReader.java:467)
 at 
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedPrimitiveColumnReader.readBatch(VectorizedPrimitiveColumnReader.java:68)
 at 
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:410)
 at 
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353)
 at 
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92)
 at 
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365)
 ... 24 more{code}
 

This issue seems to be caused by re-using the same dictionary column vector 
while reading consecutive row groups. This looks like one of the corner case 
bug which occurs for a certain distribution of dictionary/plain encoded data 
while we read/populate the underlying bit packed dictionary data into a 
column-vector based data structure. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to