adamreeve opened a new issue, #7629:
URL: https://github.com/apache/arrow-rs/issues/7629
**Describe the bug**
Trying to read a Parquet file that uses modular encryption when page indices
are enabled in the `ArrowReaderOptions` results in an error like:
```
ArrowError("Parquet argument error: External: bad data")
```
**To Reproduce**
This test reproduces the issue when added to
`parquet/tests/encryption/encryption_async.rs`:
```rust
#[tokio::test]
async fn test_read_with_page_index() {
let test_data = arrow::util::test_util::parquet_test_data();
let path = format!("{test_data}/uniform_encryption.parquet.encrypted");
let mut file = File::open(&path).await.unwrap();
let key_code: &[u8] = "0123456789012345".as_bytes();
let decryption_properties =
FileDecryptionProperties::builder(key_code.to_vec())
.build()
.unwrap();
let options = ArrowReaderOptions::new()
.with_file_decryption_properties(decryption_properties)
.with_page_index(true);
let arrow_metadata = ArrowReaderMetadata::load_async(&mut file, options)
.await
.unwrap();
let record_reader = ParquetRecordBatchStreamBuilder::new_with_metadata(
file,
arrow_metadata,
)
.build()
.unwrap();
let _record_batches =
record_reader.try_collect::<Vec<_>>().await.unwrap();
}
```
**Expected behavior**
Data should be read successfully, and give the same results as when
`with_page_index(false)` is used.
**Additional context**
This was encountered by @corwinjoy when integrating encryption support in
DataFusion. Page indexes are enabled when data is queried with a filter
predicate.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]