Kristoffer Sjögren created PARQUET-112:
------------------------------------------

             Summary: RunLengthBitPackingHybridDecoder: Reading past 
RLE/BitPacking stream.
                 Key: PARQUET-112
                 URL: https://issues.apache.org/jira/browse/PARQUET-112
             Project: Parquet
          Issue Type: Bug
          Components: parquet-mr
         Environment: Java 1.7 Linux Debian
            Reporter: Kristoffer Sjögren


I am using Avro and Crunch 0.11 to write data into Hadoop CDH 4.6 in parquet 
format. This works fine for a few gigabytes but blows up in the 
RunLengthBitPackingHybridDecoder when reading a few thousands gigabytes.

parquet.io.ParquetDecodingException: Can not read value at 19453 in block 0 in 
file 
hdfs://nn-ix01.se-ix.delta.prod:8020/user/stoffe/parquet/dogfight/2014/09/29/part-m-00153.snappy.parquet
        at 
parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:177)
        at 
parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:130)
        at 
org.apache.crunch.impl.mr.run.CrunchRecordReader.nextKeyValue(CrunchRecordReader.java:157)
        at 
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:483)
        at 
org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:76)
        at 
org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:85)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:139)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
        at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: parquet.io.ParquetDecodingException: Can't read value in column 
[action] BINARY at value 697332 out of 872236, 96921 out of 96921 in 
currentPage. repetition level: 0, definition level: 1
        at 
parquet.column.impl.ColumnReaderImpl.readValue(ColumnReaderImpl.java:466)
        at 
parquet.column.impl.ColumnReaderImpl.getBinary(ColumnReaderImpl.java:414)
        at parquet.filter.ColumnPredicates$1.apply(ColumnPredicates.java:64)
        at parquet.filter.ColumnRecordFilter.isMatch(ColumnRecordFilter.java:69)
        at 
parquet.io.FilteredRecordReader.skipToMatch(FilteredRecordReader.java:71)
        at parquet.io.FilteredRecordReader.read(FilteredRecordReader.java:57)
        at 
parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:173)
        ... 13 more
Caused by: java.lang.IllegalArgumentException: Reading past RLE/BitPacking 
stream.
        at parquet.Preconditions.checkArgument(Preconditions.java:47)
        at 
parquet.column.values.rle.RunLengthBitPackingHybridDecoder.readNext(RunLengthBitPackingHybridDecoder.java:80)
        at 
parquet.column.values.rle.RunLengthBitPackingHybridDecoder.readInt(RunLengthBitPackingHybridDecoder.java:62)
        at 
parquet.column.values.dictionary.DictionaryValuesReader.readBytes(DictionaryValuesReader.java:73)
        at 
parquet.column.impl.ColumnReaderImpl$2$7.read(ColumnReaderImpl.java:311)
        at 
parquet.column.impl.ColumnReaderImpl.readValue(ColumnReaderImpl.java:462)
        ... 19 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to