[jira] [Updated] (FLINK-21397) BufferUnderflowException when read parquet

Konstantin Knauf (Jira) Mon, 12 Apr 2021 04:22:08 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-21397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Konstantin Knauf updated FLINK-21397:
-------------------------------------
    Priority: Critical  (was: Blocker)

> BufferUnderflowException when read parquet 
> -------------------------------------------
>
>                 Key: FLINK-21397
>                 URL: https://issues.apache.org/jira/browse/FLINK-21397
>             Project: Flink
>          Issue Type: Bug
>          Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>    Affects Versions: 1.12.1
>            Reporter: lihe ma
>            Priority: Critical
>         Attachments: 
> part-f33924c5-99c3-4177-9a9a-e2d5c71a799a-1-2324.snappy.parquet
>
>
> error when read parquet file . 
> when the encoding of all pages in parquet file is PLAIN_DICTIONARY , it works 
> well , but  if   parquet file contains 3 pages, and the encoding of page0 and 
> page1 is PLAIN_DICTIONARY, page2 is PLAIN  , then flink throw exception after 
> page0 and page1 read finish.
> the souurce parquet file is write by flink 1.11.
>  
> the parquet file info :
> {{row group 0}}
> {{--------------------------------------------------------------------------------}}
> {{oid: BINARY SNAPPY DO:0 FPO:4 SZ:625876/1748820/2.79 VC:95192 ENC:BIT 
> [more]...}}{{oid TV=95192 RL=0 DL=1 DS: 36972 DE:PLAIN_DICTIONARY}}
> {{ 
> ----------------------------------------------------------------------------}}
> {{ page 0: DLE:RLE RLE:BIT_PACKED VLE:PLAIN_DICTIONARY [more]... SZ:70314}}
> {{ page 1: DLE:RLE RLE:BIT_PACKED VLE:PLAIN_DICTIONARY [more]... SZ:74850}}
> {{ page 2: DLE:RLE RLE:BIT_PACKED VLE:PLAIN ST:[m [more]... SZ:568184 }}
> {{BINARY oid}}
> exception msg:
> {code:java}
> Caused by: java.nio.BufferUnderflowExceptionCaused by: 
> java.nio.BufferUnderflowException at 
> java.nio.HeapByteBuffer.get(HeapByteBuffer.java:151) at 
> java.nio.ByteBuffer.get(ByteBuffer.java:715) at 
> org.apache.parquet.io.api.Binary$ByteBufferBackedBinary.getBytes(Binary.java:422)
>  at 
> org.apache.flink.formats.parquet.vector.reader.BytesColumnReader.readBatchFromDictionaryIds(BytesColumnReader.java:77)
>  at 
> org.apache.flink.formats.parquet.vector.reader.BytesColumnReader.readBatchFromDictionaryIds(BytesColumnReader.java:31)
>  at 
> org.apache.flink.formats.parquet.vector.reader.AbstractColumnReader.readToVector(AbstractColumnReader.java:186)
>  at 
> org.apache.flink.formats.parquet.ParquetVectorizedInputFormat$ParquetReader.nextBatch(ParquetVectorizedInputFormat.java:363)
>  at 
> org.apache.flink.formats.parquet.ParquetVectorizedInputFormat$ParquetReader.readBatch(ParquetVectorizedInputFormat.java:334)
>  at 
> org.apache.flink.connector.file.src.impl.FileSourceSplitReader.fetch(FileSourceSplitReader.java:71)
>  at 
> org.apache.flink.connector.base.source.reader.fetcher.FetchTask.run(FetchTask.java:56)
>  at 
> org.apache.flink.connector.base.source.reader.fetcher.SplitFetcher.runOnce(SplitFetcher.java:138)
>  ... 6 more
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (FLINK-21397) BufferUnderflowException when read parquet

Reply via email to