rluvaton opened a new issue, #8495:
URL: https://github.com/apache/arrow-rs/issues/8495
**Describe the bug**
When reading a file that was created with older parquet writer (parquet-mr
specificlly) and passing a schema that got from `ArrowReaderMetadata` fails
with:
```
ArrowError("incompatible arrow schema, expected struct got List(Field {
name: \"col_15\", data_type: Struct([Field { name: \"col_16\", data_type: Utf8,
nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field {
name: \"col_17\", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered:
false, metadata: {} }, Field { name: \"col_18\", data_type: Struct([Field {
name: \"col_19\", data_type: Int64, nullable: true, dict_id: 0,
dict_is_ordered: false, metadata: {} }, Field { name: \"col_20\", data_type:
Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }]),
nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }]), nullable:
false, dict_id: 0, dict_is_ordered: false, metadata: {} })")
```
**To Reproduce**
I've added the file in:
- https://github.com/apache/parquet-testing/pull/96
#### 1 liner
Run this in datafusion-cli
```sql
select * from
'https://github.com/apache/parquet-testing/raw/6d1dae7ac5dfb23fa1ac1fed5b77d3b919fbb5f8/data/backward_compat_nested.parquet';
```
#### Only the relevant parts
This is the reproduction when taking from `datafusion` only the relevant
parts that got to that error
`Cargo.toml`:
```toml
[package]
name = "repro"
version = "0.1.0"
edition = "2024"
[dependencies]
arrow = "56.2.0"
parquet = "56.2.0"
bytes = "1.10.1"
```
`main.rs`:
```rust
use std::sync::Arc;
use bytes::Bytes;
use parquet::arrow::arrow_reader::{ArrowReaderMetadata, ArrowReaderOptions};
fn main() {
// The file is the file that added here:
https://github.com/apache/parquet-testing/pull/96
let file_path =
"/private/tmp/parquet-testing/data/backward_compat_nested.parquet".to_string();
let mut data = Bytes::from(std::fs::read(file_path).unwrap());
let mut options = ArrowReaderOptions::new();
let reader_metadata = ArrowReaderMetadata::load(&mut data,
options.clone()).unwrap();
let physical_file_schema = Arc::clone(reader_metadata.schema());
// Commenting this out will make the code work
options = options
.with_schema(Arc::clone(&physical_file_schema));
ArrowReaderMetadata::try_new(Arc::clone(reader_metadata.metadata()),
options)
.unwrap();
}
```
**Expected behavior**
Should not fail
**Additional context**
this might be a bug in DataFusion rather than parquet reader here due to
backward compatibility the schema was updated to the new version:
-
https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#backward-compatibility-rules
I've added the file in:
- https://github.com/apache/parquet-testing/pull/96
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]