[ https://issues.apache.org/jira/browse/PARQUET-2212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17631252#comment-17631252 ]
ASF GitHub Bot commented on PARQUET-2212: ----------------------------------------- parthchandra opened a new pull request, #1008: URL: https://github.com/apache/parquet-mr/pull/1008 The PR adds the new ByteBuffer api and also updates ColumnChunkPageReadStore.readPage to use the new API. A few additional classes were touched (ParquetReader.Builder, BytesInput) to allow an allocator to be specified and/or to avoid ByteBuffer -> byte array copying. These changes were necessary to enable the unit test. A user option has been added to explicitly enable/disable the use of the ByteBuffer api for decryption. ### Jira - My PR addresses t [Parquet 2212](https://issues.apache.org/jira/browse/PARQUET-2212) ### Tests - Updates Unit test(s) in `org.apache.parquet.crypto.TestPropertiesDrivenEncryption` > Add ByteBuffer api for decryptors to allow direct memory to be decrypted > ------------------------------------------------------------------------ > > Key: PARQUET-2212 > URL: https://issues.apache.org/jira/browse/PARQUET-2212 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr > Affects Versions: 1.12.3 > Reporter: Parth Chandra > Priority: Major > Fix For: 1.12.3 > > > The decrypt API in BlockCipher.Decryptor currently only provides an api that > takes in a byte array > {code:java} > byte[] decrypt(byte[] lengthAndCiphertext, byte[] AAD);{code} > A parquet reader that uses the DirectByteBufferAllocator has to incur the > cost of copying the data into a byte array (and sometimes back to a > DirectByteBuffer) to decrypt data. > This proposes adding a new API that accepts ByteBuffer as input and avoids > the data copy. > {code:java} > ByteBuffer decrypt(ByteBuffer from, byte[] AAD);{code} > The decryption in ColumnChunkPageReadStore can also be updated to use the > ByteBuffer based api if the buffer is a DirectByteBuffer. If the buffer is a > HeapByteBuffer, then we can continue to use the byte array API since that > does not incur a copy when the underlying byte array is accessed. > Also, some investigation has shown that decryption with ByteBuffers is not > able to use hardware acceleration in JVM's before JDK17. In those cases, the > overall decryption speed is faster with byte arrays even after incurring the > overhead of making a copy. > The proposal, then, is to enable the use of the ByteBuffer api for > DirectByteBuffers only, and only if the JDK is JDK17 or higher or the user > explicitly configures it. -- This message was sent by Atlassian Jira (v8.20.10#820010)