[
https://issues.apache.org/jira/browse/PARQUET-2212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17631252#comment-17631252
]
ASF GitHub Bot commented on PARQUET-2212:
-----------------------------------------
parthchandra opened a new pull request, #1008:
URL: https://github.com/apache/parquet-mr/pull/1008
The PR adds the new ByteBuffer api and also updates
ColumnChunkPageReadStore.readPage to use the new API.
A few additional classes were touched (ParquetReader.Builder, BytesInput) to
allow an allocator to be specified and/or to avoid ByteBuffer -> byte array
copying. These changes were necessary to enable the unit test.
A user option has been added to explicitly enable/disable the use of the
ByteBuffer api for decryption.
### Jira
- My PR addresses t [Parquet
2212](https://issues.apache.org/jira/browse/PARQUET-2212)
### Tests
- Updates Unit test(s) in
`org.apache.parquet.crypto.TestPropertiesDrivenEncryption`
> Add ByteBuffer api for decryptors to allow direct memory to be decrypted
> ------------------------------------------------------------------------
>
> Key: PARQUET-2212
> URL: https://issues.apache.org/jira/browse/PARQUET-2212
> Project: Parquet
> Issue Type: Improvement
> Components: parquet-mr
> Affects Versions: 1.12.3
> Reporter: Parth Chandra
> Priority: Major
> Fix For: 1.12.3
>
>
> The decrypt API in BlockCipher.Decryptor currently only provides an api that
> takes in a byte array
> {code:java}
> byte[] decrypt(byte[] lengthAndCiphertext, byte[] AAD);{code}
> A parquet reader that uses the DirectByteBufferAllocator has to incur the
> cost of copying the data into a byte array (and sometimes back to a
> DirectByteBuffer) to decrypt data.
> This proposes adding a new API that accepts ByteBuffer as input and avoids
> the data copy.
> {code:java}
> ByteBuffer decrypt(ByteBuffer from, byte[] AAD);{code}
> The decryption in ColumnChunkPageReadStore can also be updated to use the
> ByteBuffer based api if the buffer is a DirectByteBuffer. If the buffer is a
> HeapByteBuffer, then we can continue to use the byte array API since that
> does not incur a copy when the underlying byte array is accessed.
> Also, some investigation has shown that decryption with ByteBuffers is not
> able to use hardware acceleration in JVM's before JDK17. In those cases, the
> overall decryption speed is faster with byte arrays even after incurring the
> overhead of making a copy.
> The proposal, then, is to enable the use of the ByteBuffer api for
> DirectByteBuffers only, and only if the JDK is JDK17 or higher or the user
> explicitly configures it.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)