[ 
https://issues.apache.org/jira/browse/PARQUET-244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alosh Bennett updated PARQUET-244:
----------------------------------
    Description: 
DeltaByteArrayReader.readBytes() fails with ArrayIndexOutOfBoundsException soon 
after it has processed a new page via initFromPage(). This issue can be 
reproduced by trying to read a Binary column that is encoded using delta byte 
array and spans multiple pages.

This is happening because ColumnReaderImpl.initDataReader() creates a new 
ValueReader every time a new page is processed (see _this.dataColumn = 
dataEncoding.getValuesReader(path, VALUES)_). The DeltaByteArrayReader is 
stateful and needs to remember the _previous_ Binary value across pages. When a 
new DeltaByteArrayReader is created, this information is lost.


  was:
DeltaByteArrayReader.readBytes() fails with  ArrayIndexOutOfBoundsException 
soon after it has processed a new page via initFromPage(). This is happening 
because 



> DeltaByteArrayReader fails with ArrayIndexOutOfBoundsException when moving 
> across pages
> ---------------------------------------------------------------------------------------
>
>                 Key: PARQUET-244
>                 URL: https://issues.apache.org/jira/browse/PARQUET-244
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-mr
>    Affects Versions: parquet-mr_1.6.0
>            Reporter: Alosh Bennett
>
> DeltaByteArrayReader.readBytes() fails with ArrayIndexOutOfBoundsException 
> soon after it has processed a new page via initFromPage(). This issue can be 
> reproduced by trying to read a Binary column that is encoded using delta byte 
> array and spans multiple pages.
> This is happening because ColumnReaderImpl.initDataReader() creates a new 
> ValueReader every time a new page is processed (see _this.dataColumn = 
> dataEncoding.getValuesReader(path, VALUES)_). The DeltaByteArrayReader is 
> stateful and needs to remember the _previous_ Binary value across pages. When 
> a new DeltaByteArrayReader is created, this information is lost.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to