[
https://issues.apache.org/jira/browse/PARQUET-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16840323#comment-16840323
]
Gabor Szadovszky commented on PARQUET-1575:
-------------------------------------------
parquet-mr 1.11 is not released yet so 1.12 is not even planned. Could you
please provide the exact commit id you have tested with?
I was not able to reproduce the issue. Could you provide more details (e.g. the
schema of the file, exact number of records etc.) or a unit test for
reproduction?
> Parquet reader throws error "Reading past RLE/BitPacking stream" for parquet
> file with null values
> --------------------------------------------------------------------------------------------------
>
> Key: PARQUET-1575
> URL: https://issues.apache.org/jira/browse/PARQUET-1575
> Project: Parquet
> Issue Type: Bug
> Components: parquet-mr
> Affects Versions: 1.12.0
> Reporter: shyam narayan singh
> Priority: Major
>
> Recently moved from parquet 1.8.x to 1.12 recently.
> Dataset has > 20k null values to be written to a complex type. Earlier with
> 1.8.x, it would create single page but with 1.12 it creates 20 pages (parquet
> - 1414). Writing nulls to complex types has been optimised to be cached (null
> cache) that would be flushed on next non null encounter or explicit
> flush/close. With 1.8, it would have encountered explicit close and flush the
> null cache and write the page. But with 1.12, after encountering 20k values,
> the page is written prematurely.
>
> Below is the metadata dump in both cases.
> 1.8 :
> index._id TV=111396 RL=0 DL=2
> ----------------------------------------------------------------------------
> page 0: DLE:RLE RLE:BIT_PACKED VLE:PLAIN ST:[num_nulls: 111396, min/max not
> defined] SZ:8 VC:111396
>
> 1.12 :
> index._index TV=111396 RL=0 DL=2
> ----------------------------------------------------------------------------
> page 0: DLE:RLE RLE:BIT_PACKED VLE:PLAIN ST:[no stats for this column] SZ:4
> VC:0 ...... page 19: DLE:RLE RLE:BIT_PACKED VLE:PLAIN ST:[no stats for this
> column] SZ:8 VC:111396
> All the pages in 1.12 except the last page have same metadata. Now the issue
> is when the parquet reader kicks in, it sees that the RLE is bit packed and
> reads 8 bytes which goes beyond the stream as the size is only 4 (Reading
> past RLE/BitPacking stream).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)