MainRo commented on PR #39216: URL: https://github.com/apache/arrow/pull/39216#issuecomment-2666238198
The random crash is related the combination of encryption, multi-column access and multi-threading. I made several experiments and it seems that it happens only when we read/write multiple columns with multi-threading being enabled: - When reading a single column, then I was not able to reproduce the crash. - When disabling multithreading, then I was not able to repoduce the crash. The issue is present with both the ParquetFile and Dataset APIs. A possibility is that a single cipher contexts for each column is shared in multiple threads while we should create one per thread? I am not familiar with the code base so maybe this is not the cause of the issue. If this is the case, then similar crashes may happen when a single encryption key shared on several columns (in column_key encryption). Here is a commit where I disabled mutli-threading for these tests and there is no more crash: https://github.com/MainRo/arrow/commit/38341b86b53d28037fac7dafa1b0b6af0e606459#diff-4d00e9ed2c9a418aead5c9a77113a9ea7162aed76a0e52a4bfacc820f0abed4c -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
