pitrou commented on issue #43057: URL: https://github.com/apache/arrow/issues/43057#issuecomment-2488978221
> I haven't yet found the cause of multiple threads getting hold of the same AesDecryptorImpl instance. It'd be great to get some pointers to the mechanics of that. I think this happens in multiple places. At the top-most level (the Parquet file reader), a single `CryptoContext` is created par column: https://github.com/apache/arrow/blob/9015a81d0e9f9a861509e5e1b6f96c0d8c01a999/cpp/src/parquet/file_reader.cc#L275-L276 This `CryptoContext` has a single data decryptor for the entire column, even though different pages or row groups in the column may be read from different threads at once. Worse, most encrypted Parquet files will use the same (footer) key for decrypting all columns. This means all `CryptoContext` instances for the columns of a given file will hold the same data decryptor: https://github.com/apache/arrow/blob/9015a81d0e9f9a861509e5e1b6f96c0d8c01a999/cpp/src/parquet/encryption/internal_file_decryptor.cc#L224-L227 As you point out, the decryptors should be created on an adhoc basic for each decryption operation. Hopefully this is a fast operation? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
