[ 
https://issues.apache.org/jira/browse/PARQUET-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ence Wang updated PARQUET-2424:
-------------------------------
    Attachment: image-2024-02-04-19-21-41-207.png

> Encrypted parquet files can't have more than 32767 pages per chunk: 32768
> -------------------------------------------------------------------------
>
>                 Key: PARQUET-2424
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2424
>             Project: Parquet
>          Issue Type: Bug
>    Affects Versions: 1.13.1
>            Reporter: Ence Wang
>            Priority: Major
>         Attachments: image-2024-02-04-19-21-41-207.png, reproduce.zip
>
>
> When we were writing an encrypted file, we encountered the following error:
> {code:java}
> Encrypted parquet files can't have more than 32767 pages per chunk: 32768
> {code}
>  
> *Error Stack:*
> {code:java}
> org.apache.parquet.crypto.ParquetCryptoRuntimeException: Encrypted parquet 
> files can't have more than 32767 pages per chunk: 32768
>         at 
> org.apache.parquet.crypto.AesCipher.quickUpdatePageAAD(AesCipher.java:131)
>         at 
> org.apache.parquet.hadoop.ColumnChunkPageWriteStore$ColumnChunkPageWriter.writePage(ColumnChunkPageWriteStore.java:178)
>         at 
> org.apache.parquet.column.impl.ColumnWriterV1.writePage(ColumnWriterV1.java:67)
>         at 
> org.apache.parquet.column.impl.ColumnWriterBase.writePage(ColumnWriterBase.java:392)
>         at 
> org.apache.parquet.column.impl.ColumnWriteStoreBase.sizeCheck(ColumnWriteStoreBase.java:231)
>         at 
> org.apache.parquet.column.impl.ColumnWriteStoreBase.endRecord(ColumnWriteStoreBase.java:216)
>         at 
> org.apache.parquet.column.impl.ColumnWriteStoreV1.endRecord(ColumnWriteStoreV1.java:29)
>         at 
> org.apache.parquet.io.MessageColumnIO$MessageColumnIORecordConsumer.endMessage(MessageColumnIO.java:295){code}
>  
> *Reasons:*
> The `getBufferedSize` method of 
> [FallbackValuesWriter|https://github.com/apache/parquet-mr/blob/19f284355847696fa254c789ab93c42db9af5982/parquet-column/src/main/java/org/apache/parquet/column/values/fallback/FallbackValuesWriter.java#L73]
> returns raw data size to decide if we want to flush the page, 
> so the actual size of the page written could be much more smaller due to 
> dictionary encoding. This prevents page being too big when fallback happens, 
> but can also produce too many pages in a single column chunk. On the other 
> side, the encryption module only supports up to  32767 pages per chunk, as we 
> use `Short` to store page ordinal as a part of 
> [AAD|https://github.com/apache/parquet-format/blob/master/Encryption.md#442-aad-suffix].
>  
>  
> *Reproduce:*
> *[^reproduce.zip]*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to