[ 
https://issues.apache.org/jira/browse/PARQUET-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Atul Felix Payapilly updated PARQUET-2367:
------------------------------------------
    Description: 
On Spark 3.3.1 which uses parquet 1.12.2, parquet files were successfully 
created using default parquet configs. Note: the write succeeded, so this is 
not the same as: https://issues.apache.org/jira/browse/PARQUET-1632

 

The payload had large strings and this resulted in the following exception on 
read:
{code:java}
Caused by: java.lang.NegativeArraySizeException
at 
org.apache.parquet.bytes.BytesInput$StreamBytesInput.toByteArray(BytesInput.java:262)
at org.apache.parquet.bytes.BytesInput.toByteBuffer(BytesInput.java:214)
at org.apache.parquet.bytes.BytesInput.toInputStream(BytesInput.java:223)
at 
org.apache.parquet.column.impl.ColumnReaderImpl.readPageV1(ColumnReaderImpl.java:592)
at 
org.apache.parquet.column.impl.ColumnReaderImpl.access$300(ColumnReaderImpl.java:57)
at 
org.apache.parquet.column.impl.ColumnReaderImpl$3.visit(ColumnReaderImpl.java:536)
at 
org.apache.parquet.column.impl.ColumnReaderImpl$3.visit(ColumnReaderImpl.java:533)
at org.apache.parquet.column.page.DataPageV1.accept(DataPageV1.java:95) {code}
The issue could be addressed with the following configs:
{code:java}
parquet.page.size.row.check.min=1
parquet.page.size.row.check.max=1000
parquet.page.size.check.estimate=false
spark.sql.parquet.columnarReaderBatchSize=2098 {code}
 

> NegativeArraySizeException on read for parquet files written with large 
> strings in some cases
> ---------------------------------------------------------------------------------------------
>
>                 Key: PARQUET-2367
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2367
>             Project: Parquet
>          Issue Type: Bug
>    Affects Versions: 1.12.2
>            Reporter: Atul Felix Payapilly
>            Priority: Major
>
> On Spark 3.3.1 which uses parquet 1.12.2, parquet files were successfully 
> created using default parquet configs. Note: the write succeeded, so this is 
> not the same as: https://issues.apache.org/jira/browse/PARQUET-1632
>  
> The payload had large strings and this resulted in the following exception on 
> read:
> {code:java}
> Caused by: java.lang.NegativeArraySizeException
> at 
> org.apache.parquet.bytes.BytesInput$StreamBytesInput.toByteArray(BytesInput.java:262)
> at org.apache.parquet.bytes.BytesInput.toByteBuffer(BytesInput.java:214)
> at org.apache.parquet.bytes.BytesInput.toInputStream(BytesInput.java:223)
> at 
> org.apache.parquet.column.impl.ColumnReaderImpl.readPageV1(ColumnReaderImpl.java:592)
> at 
> org.apache.parquet.column.impl.ColumnReaderImpl.access$300(ColumnReaderImpl.java:57)
> at 
> org.apache.parquet.column.impl.ColumnReaderImpl$3.visit(ColumnReaderImpl.java:536)
> at 
> org.apache.parquet.column.impl.ColumnReaderImpl$3.visit(ColumnReaderImpl.java:533)
> at org.apache.parquet.column.page.DataPageV1.accept(DataPageV1.java:95) {code}
> The issue could be addressed with the following configs:
> {code:java}
> parquet.page.size.row.check.min=1
> parquet.page.size.row.check.max=1000
> parquet.page.size.check.estimate=false
> spark.sql.parquet.columnarReaderBatchSize=2098 {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to