[ 
https://issues.apache.org/jira/browse/PARQUET-112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17200419#comment-17200419
 ] 

Tristan Davolt commented on PARQUET-112:
----------------------------------------

I am facing the same issue with Parquet 1.10.0. Data is being written using 
AvroParquetWriter and Snappy compression. Occasionally and randomly, one file 
of the many we write using the same method will throw a similar error as above 
when being read by any parquet reader. I have not yet found a workaround. The 
exception is thrown for the final value of a random column. This does not only 
occur with null fields. Our schema defines every field as optional.


{code:java}
java.lang.IllegalArgumentException: Reading past RLE/BitPacking 
stream.java.lang.IllegalArgumentException: Reading past RLE/BitPacking stream. 
at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:53) at 
org.apache.parquet.column.values.rle.RunLengthBitPackingHybridDecoder.readNext(RunLengthBitPackingHybridDecoder.java:80)
 at 
org.apache.parquet.column.values.rle.RunLengthBitPackingHybridDecoder.readInt(RunLengthBitPackingHybridDecoder.java:62)
 at 
org.apache.parquet.column.values.rle.RunLengthBitPackingHybridValuesReader.readInteger(RunLengthBitPackingHybridValuesReader.java:53)
 at 
org.apache.parquet.column.impl.ColumnReaderBase$ValuesReaderIntIterator.nextInt(ColumnReaderBase.java:733)
 at 
org.apache.parquet.column.impl.ColumnReaderBase.checkRead(ColumnReaderBase.java:568)
 at 
org.apache.parquet.column.impl.ColumnReaderBase.consume(ColumnReaderBase.java:705)
 at 
org.apache.parquet.column.impl.ColumnReaderImpl.consume(ColumnReaderImpl.java:30)
 at org.apache.parquet.tools.command.DumpCommand.dump(DumpCommand.java:358) at 
org.apache.parquet.tools.command.DumpCommand.dump(DumpCommand.java:231) at 
org.apache.parquet.tools.command.DumpCommand.execute(DumpCommand.java:148) at 
org.apache.parquet.tools.Main.main(Main.java:223)java.lang.IllegalArgumentException:
 Reading past RLE/BitPacking stream.{code}

> RunLengthBitPackingHybridDecoder: Reading past RLE/BitPacking stream.
> ---------------------------------------------------------------------
>
>                 Key: PARQUET-112
>                 URL: https://issues.apache.org/jira/browse/PARQUET-112
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-mr
>         Environment: Java 1.7 Linux Debian
>            Reporter: Kristoffer Sjögren
>            Assignee: Reuben Kuhnert
>            Priority: Major
>
> I am using Avro and Crunch 0.11 to write data into Hadoop CDH 4.6 in parquet 
> format. This works fine for a few gigabytes but blows up in the 
> RunLengthBitPackingHybridDecoder when reading a few thousands gigabytes.
> {code}
> parquet.io.ParquetDecodingException: Can not read value at 19453 in block 0 
> in file 
> hdfs://nn-ix01.se-ix.delta.prod:8020/user/stoffe/parquet/dogfight/2014/09/29/part-m-00153.snappy.parquet
>       at 
> parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:177)
>       at 
> parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:130)
>       at 
> org.apache.crunch.impl.mr.run.CrunchRecordReader.nextKeyValue(CrunchRecordReader.java:157)
>       at 
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:483)
>       at 
> org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:76)
>       at 
> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:85)
>       at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:139)
>       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
>       at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:396)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>       at org.apache.hadoop.mapred.Child.main(Child.java:262)
> Caused by: parquet.io.ParquetDecodingException: Can't read value in column 
> [action] BINARY at value 697332 out of 872236, 96921 out of 96921 in 
> currentPage. repetition level: 0, definition level: 1
>       at 
> parquet.column.impl.ColumnReaderImpl.readValue(ColumnReaderImpl.java:466)
>       at 
> parquet.column.impl.ColumnReaderImpl.getBinary(ColumnReaderImpl.java:414)
>       at parquet.filter.ColumnPredicates$1.apply(ColumnPredicates.java:64)
>       at parquet.filter.ColumnRecordFilter.isMatch(ColumnRecordFilter.java:69)
>       at 
> parquet.io.FilteredRecordReader.skipToMatch(FilteredRecordReader.java:71)
>       at parquet.io.FilteredRecordReader.read(FilteredRecordReader.java:57)
>       at 
> parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:173)
>       ... 13 more
> Caused by: java.lang.IllegalArgumentException: Reading past RLE/BitPacking 
> stream.
>       at parquet.Preconditions.checkArgument(Preconditions.java:47)
>       at 
> parquet.column.values.rle.RunLengthBitPackingHybridDecoder.readNext(RunLengthBitPackingHybridDecoder.java:80)
>       at 
> parquet.column.values.rle.RunLengthBitPackingHybridDecoder.readInt(RunLengthBitPackingHybridDecoder.java:62)
>       at 
> parquet.column.values.dictionary.DictionaryValuesReader.readBytes(DictionaryValuesReader.java:73)
>       at 
> parquet.column.impl.ColumnReaderImpl$2$7.read(ColumnReaderImpl.java:311)
>       at 
> parquet.column.impl.ColumnReaderImpl.readValue(ColumnReaderImpl.java:462)
>       ... 19 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to