[
https://issues.apache.org/jira/browse/PARQUET-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17813533#comment-17813533
]
Gidon Gershinsky commented on PARQUET-2424:
-------------------------------------------
We might be able to double the limit to 64K (will need to check), but the
question is, will it be sufficient for your usecase [~encewang] ? Can you find
/ estimate the max number of pages per column chunk in your data? (without
encryption)
> Encrypted parquet files can't have more than 32767 pages per chunk: 32768
> -------------------------------------------------------------------------
>
> Key: PARQUET-2424
> URL: https://issues.apache.org/jira/browse/PARQUET-2424
> Project: Parquet
> Issue Type: Bug
> Affects Versions: 1.13.1
> Reporter: Ence Wang
> Priority: Major
> Attachments: reproduce.zip
>
>
> When we were writing an encrypted file, we encountered the following error:
> {code:java}
> Encrypted parquet files can't have more than 32767 pages per chunk: 32768
> {code}
>
> *Error Stack:*
> {code:java}
> org.apache.parquet.crypto.ParquetCryptoRuntimeException: Encrypted parquet
> files can't have more than 32767 pages per chunk: 32768
> at
> org.apache.parquet.crypto.AesCipher.quickUpdatePageAAD(AesCipher.java:131)
> at
> org.apache.parquet.hadoop.ColumnChunkPageWriteStore$ColumnChunkPageWriter.writePage(ColumnChunkPageWriteStore.java:178)
> at
> org.apache.parquet.column.impl.ColumnWriterV1.writePage(ColumnWriterV1.java:67)
> at
> org.apache.parquet.column.impl.ColumnWriterBase.writePage(ColumnWriterBase.java:392)
> at
> org.apache.parquet.column.impl.ColumnWriteStoreBase.sizeCheck(ColumnWriteStoreBase.java:231)
> at
> org.apache.parquet.column.impl.ColumnWriteStoreBase.endRecord(ColumnWriteStoreBase.java:216)
> at
> org.apache.parquet.column.impl.ColumnWriteStoreV1.endRecord(ColumnWriteStoreV1.java:29)
> at
> org.apache.parquet.io.MessageColumnIO$MessageColumnIORecordConsumer.endMessage(MessageColumnIO.java:295){code}
>
> *Reasons:*
> The `getBufferedSize` method of
> [FallbackValuesWriter|https://github.com/apache/parquet-mr/blob/19f284355847696fa254c789ab93c42db9af5982/parquet-column/src/main/java/org/apache/parquet/column/values/fallback/FallbackValuesWriter.java#L73]
> returns raw data size to decide if we want to flush the page,
> so the actual size of the page written could be much more smaller due to
> dictionary encoding. This prevents page being too big when fallback happens,
> but can also produce too many pages in a single column chunk. On the other
> side, the encryption module only supports up to 32767 pages per chunk, as we
> use `Short` to store page ordinal as a part of
> [AAD|https://github.com/apache/parquet-format/blob/master/Encryption.md#442-aad-suffix].
>
>
> *Reproduce:*
> *[^reproduce.zip]*
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]