MainRo commented on PR #39216:
URL: https://github.com/apache/arrow/pull/39216#issuecomment-2666238198

   The random crash is related the combination of encryption, multi-column 
access and multi-threading.
   
   I made several experiments and it seems that it happens only when we 
read/write multiple columns with multi-threading being enabled:
   
   - When reading a single column, then I was not able to reproduce the crash. 
   - When disabling multithreading, then I was not able to repoduce the crash.
   
   The issue is present with both the ParquetFile and Dataset APIs.
   
   A possibility is that a single cipher contexts for each column is shared in 
multiple threads while we should create one per thread? I am not familiar with 
the code base so maybe this is not the cause of the issue. If this is the case, 
then similar crashes may happen when a single encryption key shared on several 
columns (in column_key encryption).
   
   Here is a commit where I disabled mutli-threading for these tests and there 
is no more crash:
   
https://github.com/MainRo/arrow/commit/38341b86b53d28037fac7dafa1b0b6af0e606459#diff-4d00e9ed2c9a418aead5c9a77113a9ea7162aed76a0e52a4bfacc820f0abed4c


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to