Parth Chandra created PARQUET-2212:
--------------------------------------

             Summary: Add ByteBuffer api for decryptors to allow direct memory 
to be decrypted
                 Key: PARQUET-2212
                 URL: https://issues.apache.org/jira/browse/PARQUET-2212
             Project: Parquet
          Issue Type: Improvement
          Components: parquet-mr
    Affects Versions: 1.12.3
            Reporter: Parth Chandra
             Fix For: 1.12.3


The decrypt API in BlockCipher.Decryptor currently only provides an api that 
takes in a byte array
{code:java}
byte[] decrypt(byte[] lengthAndCiphertext, byte[] AAD);{code}
A parquet reader that uses the DirectByteBufferAllocator has to incur the cost 
of copying the data into a byte array (and sometimes back to a 
DirectByteBuffer) to decrypt data.

This proposes adding a new API that accepts ByteBuffer as input and avoids the 
data copy.
{code:java}
ByteBuffer decrypt(ByteBuffer from, byte[] AAD);{code}

The decryption in ColumnChunkPageReadStore can also be updated to use the 
ByteBuffer based api if the buffer is a DirectByteBuffer. If the buffer is a 
HeapByteBuffer, then we can continue to use the byte array API since that does 
not incur a copy when the underlying byte array is accessed.

Also, some investigation has shown that decryption with ByteBuffers is not able 
to use hardware acceleration in JVM's before JDK17. In those cases, the overall 
decryption speed is faster with byte arrays even after incurring the overhead 
of making a copy. 

The proposal, then, is to enable the use of the ByteBuffer api for 
DirectByteBuffers only, and only if the JDK is JDK17 or higher or the user 
explicitly configures it. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to