[ 
https://issues.apache.org/jira/browse/PARQUET-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ence Wang updated PARQUET-2424:
-------------------------------
    Description: 
When we were writing an encrypted file, we encountered the following error:
{code:java}
Encrypted parquet files can't have more than 32767 pages per chunk: 32768
{code}
 

*Error Stack:*
{code:java}
org.apache.parquet.crypto.ParquetCryptoRuntimeException: Encrypted parquet 
files can't have more than 32767 pages per chunk: 32768

        at 
org.apache.parquet.crypto.AesCipher.quickUpdatePageAAD(AesCipher.java:131)
        at 
org.apache.parquet.hadoop.ColumnChunkPageWriteStore$ColumnChunkPageWriter.writePage(ColumnChunkPageWriteStore.java:178)
        at 
org.apache.parquet.column.impl.ColumnWriterV1.writePage(ColumnWriterV1.java:67)
        at 
org.apache.parquet.column.impl.ColumnWriterBase.writePage(ColumnWriterBase.java:392)
        at 
org.apache.parquet.column.impl.ColumnWriteStoreBase.sizeCheck(ColumnWriteStoreBase.java:231)
        at 
org.apache.parquet.column.impl.ColumnWriteStoreBase.endRecord(ColumnWriteStoreBase.java:216)
        at 
org.apache.parquet.column.impl.ColumnWriteStoreV1.endRecord(ColumnWriteStoreV1.java:29)
        at 
org.apache.parquet.io.MessageColumnIO$MessageColumnIORecordConsumer.endMessage(MessageColumnIO.java:295){code}
 

*Reasons:*
The `getBufferedSize` method of 
[FallbackValuesWriter|https://github.com/apache/parquet-mr/blob/19f284355847696fa254c789ab93c42db9af5982/parquet-column/src/main/java/org/apache/parquet/column/values/fallback/FallbackValuesWriter.java#L73]
returns raw data size to decide if we want to flush the page, 
so the actual size of the page written could be much more smaller due to 
dictionary encoding. This prevents page being too big when fallback happens, 
but can also produce too many pages in a single column chunk. On the other 
side, the encryption module only supports up to  32767 pages per chunk, as we 
use `Short` to store page ordinal as a part of 
[AAD|https://github.com/apache/parquet-format/blob/master/Encryption.md#442-aad-suffix].
 
 
*Reproduce:*
*[^reproduce.zip]*

  was:
When we were writing an encrypted file, we encountered the following error:
{code:java}
Encrypted parquet files can't have more than 32767 pages per chunk: 32768
{code}
 

*Error Stack:*
{code:java}
org.apache.parquet.crypto.ParquetCryptoRuntimeException: Encrypted parquet 
files can't have more than 32767 pages per chunk: 32768

        at 
org.apache.parquet.crypto.AesCipher.quickUpdatePageAAD(AesCipher.java:131)
        at 
org.apache.parquet.hadoop.ColumnChunkPageWriteStore$ColumnChunkPageWriter.writePage(ColumnChunkPageWriteStore.java:178)
        at 
org.apache.parquet.column.impl.ColumnWriterV1.writePage(ColumnWriterV1.java:67)
        at 
org.apache.parquet.column.impl.ColumnWriterBase.writePage(ColumnWriterBase.java:392)
        at 
org.apache.parquet.column.impl.ColumnWriteStoreBase.sizeCheck(ColumnWriteStoreBase.java:231)
        at 
org.apache.parquet.column.impl.ColumnWriteStoreBase.endRecord(ColumnWriteStoreBase.java:216)
        at 
org.apache.parquet.column.impl.ColumnWriteStoreV1.endRecord(ColumnWriteStoreV1.java:29)
        at 
org.apache.parquet.io.MessageColumnIO$MessageColumnIORecordConsumer.endMessage(MessageColumnIO.java:295){code}
 

*Reasons:*
The `getBufferedSize` method of 
[FallbackValuesWriter|https://github.com/apache/parquet-mr/blob/19f284355847696fa254c789ab93c42db9af5982/parquet-column/src/main/java/org/apache/parquet/column/values/fallback/FallbackValuesWriter.java#L73]
returns raw data size to decide if we want to flush the page, 
so the actual size of the page written could be much more smaller due to 
dictionary encoding. This prevents page being too big when fallback happens, 
but can also produce too many pages in a single column chunk, while the 
encryption module only support up to  32767 pages per chunk, because we use 
`Short` to store page ordinal as a part of  
[AAD|https://github.com/apache/parquet-format/blob/master/Encryption.md#442-aad-suffix].
 
 
*Reproduce:*
*[^reproduce.zip]*


> Encrypted parquet files can't have more than 32767 pages per chunk: 32768
> -------------------------------------------------------------------------
>
>                 Key: PARQUET-2424
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2424
>             Project: Parquet
>          Issue Type: Bug
>    Affects Versions: 1.13.1
>            Reporter: Ence Wang
>            Priority: Major
>         Attachments: reproduce.zip
>
>
> When we were writing an encrypted file, we encountered the following error:
> {code:java}
> Encrypted parquet files can't have more than 32767 pages per chunk: 32768
> {code}
>  
> *Error Stack:*
> {code:java}
> org.apache.parquet.crypto.ParquetCryptoRuntimeException: Encrypted parquet 
> files can't have more than 32767 pages per chunk: 32768
>         at 
> org.apache.parquet.crypto.AesCipher.quickUpdatePageAAD(AesCipher.java:131)
>         at 
> org.apache.parquet.hadoop.ColumnChunkPageWriteStore$ColumnChunkPageWriter.writePage(ColumnChunkPageWriteStore.java:178)
>         at 
> org.apache.parquet.column.impl.ColumnWriterV1.writePage(ColumnWriterV1.java:67)
>         at 
> org.apache.parquet.column.impl.ColumnWriterBase.writePage(ColumnWriterBase.java:392)
>         at 
> org.apache.parquet.column.impl.ColumnWriteStoreBase.sizeCheck(ColumnWriteStoreBase.java:231)
>         at 
> org.apache.parquet.column.impl.ColumnWriteStoreBase.endRecord(ColumnWriteStoreBase.java:216)
>         at 
> org.apache.parquet.column.impl.ColumnWriteStoreV1.endRecord(ColumnWriteStoreV1.java:29)
>         at 
> org.apache.parquet.io.MessageColumnIO$MessageColumnIORecordConsumer.endMessage(MessageColumnIO.java:295){code}
>  
> *Reasons:*
> The `getBufferedSize` method of 
> [FallbackValuesWriter|https://github.com/apache/parquet-mr/blob/19f284355847696fa254c789ab93c42db9af5982/parquet-column/src/main/java/org/apache/parquet/column/values/fallback/FallbackValuesWriter.java#L73]
> returns raw data size to decide if we want to flush the page, 
> so the actual size of the page written could be much more smaller due to 
> dictionary encoding. This prevents page being too big when fallback happens, 
> but can also produce too many pages in a single column chunk. On the other 
> side, the encryption module only supports up to  32767 pages per chunk, as we 
> use `Short` to store page ordinal as a part of 
> [AAD|https://github.com/apache/parquet-format/blob/master/Encryption.md#442-aad-suffix].
>  
>  
> *Reproduce:*
> *[^reproduce.zip]*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to