tolleybot opened a new pull request, #39623: URL: https://github.com/apache/arrow/pull/39623
### GH-39444: [C++/Python][Parquet] Fix Segmentation Fault in Modular Encryption **Rationale for this change:** This pull request addresses a critical issue (GH-39444) in the C++/Python components of Parquet, specifically a segmentation fault occurring when processing encrypted datasets over 2^15 rows. The fix involves modifications in `cpp/src/parquet/encryption/internal_file_decryptor.cc`, particularly in `InternalFileDecryptor::GetColumnDecryptor`. The caching of the `Decryptor` object was removed to resolve the multithreading issue causing the segmentation fault and encryption failures. **What changes are included in this PR?** - Removal of `Decryptor` object caching in `InternalFileDecryptor::GetColumnDecryptor`. - Addition of two unit tests: `large_row_parquet_encrypt_test.cc` for C++ and an update to `test_dataset_encryption.py` with `test_large_row_encryption_decryption` for Python. **Are these changes tested?** Yes, the unit tests (`large_row_parquet_encrypt_test.cc` and `test_large_row_encryption_decryption` in `test_dataset_encryption.py`) have been added to ensure the reliability and effectiveness of these changes. **Are there any user-facing changes?** No significant user-facing changes, but the update significantly improves the backend stability and reliability of Parquet file handling. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
