[ https://issues.apache.org/jira/browse/PARQUET-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Atul Felix Payapilly updated PARQUET-2367: ------------------------------------------ Description: On Spark 3.3.1 which uses parquet 1.12.2, parquet files were successfully created using default parquet configs. Note: the write succeeded, so this is not the same as: https://issues.apache.org/jira/browse/PARQUET-1632 The payload had large strings and this resulted in the following exception on read: {code:java} Caused by: java.lang.NegativeArraySizeException at org.apache.parquet.bytes.BytesInput$StreamBytesInput.toByteArray(BytesInput.java:262) at org.apache.parquet.bytes.BytesInput.toByteBuffer(BytesInput.java:214) at org.apache.parquet.bytes.BytesInput.toInputStream(BytesInput.java:223) at org.apache.parquet.column.impl.ColumnReaderImpl.readPageV1(ColumnReaderImpl.java:592) at org.apache.parquet.column.impl.ColumnReaderImpl.access$300(ColumnReaderImpl.java:57) at org.apache.parquet.column.impl.ColumnReaderImpl$3.visit(ColumnReaderImpl.java:536) at org.apache.parquet.column.impl.ColumnReaderImpl$3.visit(ColumnReaderImpl.java:533) at org.apache.parquet.column.page.DataPageV1.accept(DataPageV1.java:95) {code} The issue could be addressed with the following configs: {code:java} parquet.page.size.row.check.min=1 parquet.page.size.row.check.max=1000 parquet.page.size.check.estimate=false spark.sql.parquet.columnarReaderBatchSize=2098 {code} > NegativeArraySizeException on read for parquet files written with large > strings in some cases > --------------------------------------------------------------------------------------------- > > Key: PARQUET-2367 > URL: https://issues.apache.org/jira/browse/PARQUET-2367 > Project: Parquet > Issue Type: Bug > Affects Versions: 1.12.2 > Reporter: Atul Felix Payapilly > Priority: Major > > On Spark 3.3.1 which uses parquet 1.12.2, parquet files were successfully > created using default parquet configs. Note: the write succeeded, so this is > not the same as: https://issues.apache.org/jira/browse/PARQUET-1632 > > The payload had large strings and this resulted in the following exception on > read: > {code:java} > Caused by: java.lang.NegativeArraySizeException > at > org.apache.parquet.bytes.BytesInput$StreamBytesInput.toByteArray(BytesInput.java:262) > at org.apache.parquet.bytes.BytesInput.toByteBuffer(BytesInput.java:214) > at org.apache.parquet.bytes.BytesInput.toInputStream(BytesInput.java:223) > at > org.apache.parquet.column.impl.ColumnReaderImpl.readPageV1(ColumnReaderImpl.java:592) > at > org.apache.parquet.column.impl.ColumnReaderImpl.access$300(ColumnReaderImpl.java:57) > at > org.apache.parquet.column.impl.ColumnReaderImpl$3.visit(ColumnReaderImpl.java:536) > at > org.apache.parquet.column.impl.ColumnReaderImpl$3.visit(ColumnReaderImpl.java:533) > at org.apache.parquet.column.page.DataPageV1.accept(DataPageV1.java:95) {code} > The issue could be addressed with the following configs: > {code:java} > parquet.page.size.row.check.min=1 > parquet.page.size.row.check.max=1000 > parquet.page.size.check.estimate=false > spark.sql.parquet.columnarReaderBatchSize=2098 {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)