Parth Chandra created PARQUET-2212:
--------------------------------------
Summary: Add ByteBuffer api for decryptors to allow direct memory
to be decrypted
Key: PARQUET-2212
URL: https://issues.apache.org/jira/browse/PARQUET-2212
Project: Parquet
Issue Type: Improvement
Components: parquet-mr
Affects Versions: 1.12.3
Reporter: Parth Chandra
Fix For: 1.12.3
The decrypt API in BlockCipher.Decryptor currently only provides an api that
takes in a byte array
{code:java}
byte[] decrypt(byte[] lengthAndCiphertext, byte[] AAD);{code}
A parquet reader that uses the DirectByteBufferAllocator has to incur the cost
of copying the data into a byte array (and sometimes back to a
DirectByteBuffer) to decrypt data.
This proposes adding a new API that accepts ByteBuffer as input and avoids the
data copy.
{code:java}
ByteBuffer decrypt(ByteBuffer from, byte[] AAD);{code}
The decryption in ColumnChunkPageReadStore can also be updated to use the
ByteBuffer based api if the buffer is a DirectByteBuffer. If the buffer is a
HeapByteBuffer, then we can continue to use the byte array API since that does
not incur a copy when the underlying byte array is accessed.
Also, some investigation has shown that decryption with ByteBuffers is not able
to use hardware acceleration in JVM's before JDK17. In those cases, the overall
decryption speed is faster with byte arrays even after incurring the overhead
of making a copy.
The proposal, then, is to enable the use of the ByteBuffer api for
DirectByteBuffers only, and only if the JDK is JDK17 or higher or the user
explicitly configures it.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)