pitrou commented on issue #43057:
URL: https://github.com/apache/arrow/issues/43057#issuecomment-2488978221

   > I haven't yet found the cause of multiple threads getting hold of the same 
AesDecryptorImpl instance. It'd be great to get some pointers to the mechanics 
of that.
   
   I think this happens in multiple places.
   
   At the top-most level (the Parquet file reader), a single `CryptoContext` is 
created par column:
   
https://github.com/apache/arrow/blob/9015a81d0e9f9a861509e5e1b6f96c0d8c01a999/cpp/src/parquet/file_reader.cc#L275-L276
   
   This `CryptoContext` has a single data decryptor for the entire column, even 
though different pages or row groups in the column may be read from different 
threads at once.
   
   Worse, most encrypted Parquet files will use the same (footer) key for 
decrypting all columns. This means all `CryptoContext` instances for the 
columns of a given file will hold the same data decryptor:
   
https://github.com/apache/arrow/blob/9015a81d0e9f9a861509e5e1b6f96c0d8c01a999/cpp/src/parquet/encryption/internal_file_decryptor.cc#L224-L227
   
   As you point out, the decryptors should be created on an adhoc basic for 
each decryption operation. Hopefully this is a fast operation?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to