[ https://issues.apache.org/jira/browse/PARQUET-244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alosh Bennett updated PARQUET-244: ---------------------------------- Description: DeltaByteArrayReader.readBytes() fails with ArrayIndexOutOfBoundsException soon after it has processed a new page via initFromPage(). This issue can be reproduced by trying to read a Binary column that is encoded using delta byte array and spans multiple pages. This is happening because ColumnReaderImpl.initDataReader() creates a new ValueReader every time a new page is processed (see _this.dataColumn = dataEncoding.getValuesReader(path, VALUES)_). The DeltaByteArrayReader is stateful and needs to remember the _previous_ Binary value across pages. When a new DeltaByteArrayReader is created, this information is lost. was: DeltaByteArrayReader.readBytes() fails with ArrayIndexOutOfBoundsException soon after it has processed a new page via initFromPage(). This is happening because > DeltaByteArrayReader fails with ArrayIndexOutOfBoundsException when moving > across pages > --------------------------------------------------------------------------------------- > > Key: PARQUET-244 > URL: https://issues.apache.org/jira/browse/PARQUET-244 > Project: Parquet > Issue Type: Bug > Components: parquet-mr > Affects Versions: parquet-mr_1.6.0 > Reporter: Alosh Bennett > > DeltaByteArrayReader.readBytes() fails with ArrayIndexOutOfBoundsException > soon after it has processed a new page via initFromPage(). This issue can be > reproduced by trying to read a Binary column that is encoded using delta byte > array and spans multiple pages. > This is happening because ColumnReaderImpl.initDataReader() creates a new > ValueReader every time a new page is processed (see _this.dataColumn = > dataEncoding.getValuesReader(path, VALUES)_). The DeltaByteArrayReader is > stateful and needs to remember the _previous_ Binary value across pages. When > a new DeltaByteArrayReader is created, this information is lost. -- This message was sent by Atlassian JIRA (v6.3.4#6332)