rluvaton opened a new issue, #8495:
URL: https://github.com/apache/arrow-rs/issues/8495

   **Describe the bug**
   When reading a file that was created with older parquet writer (parquet-mr 
specificlly) and passing a schema that got from `ArrowReaderMetadata` fails 
with:
   ```
   ArrowError("incompatible arrow schema, expected struct got List(Field { 
name: \"col_15\", data_type: Struct([Field { name: \"col_16\", data_type: Utf8, 
nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { 
name: \"col_17\", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: 
false, metadata: {} }, Field { name: \"col_18\", data_type: Struct([Field { 
name: \"col_19\", data_type: Int64, nullable: true, dict_id: 0, 
dict_is_ordered: false, metadata: {} }, Field { name: \"col_20\", data_type: 
Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }]), 
nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }]), nullable: 
false, dict_id: 0, dict_is_ordered: false, metadata: {} })")
   ```
   
   **To Reproduce**
   
   I've added the file in:
   - https://github.com/apache/parquet-testing/pull/96
   
   #### 1 liner
   
   Run this in datafusion-cli
   ```sql
   select * from 
'https://github.com/apache/parquet-testing/raw/6d1dae7ac5dfb23fa1ac1fed5b77d3b919fbb5f8/data/backward_compat_nested.parquet';
   ```
   
   
   #### Only the relevant parts
   
   This is the reproduction when taking from `datafusion` only the relevant 
parts that got to that error 
   
   `Cargo.toml`:
   ```toml
   [package]
   name = "repro"
   version = "0.1.0"
   edition = "2024"
   
   [dependencies]
   arrow = "56.2.0"
   parquet = "56.2.0"
   bytes = "1.10.1"
   ```
   
   `main.rs`:
   ```rust
   use std::sync::Arc;
   use bytes::Bytes;
   use parquet::arrow::arrow_reader::{ArrowReaderMetadata, ArrowReaderOptions};
   
   fn main() {
       // The file is the file that added here: 
https://github.com/apache/parquet-testing/pull/96
       let file_path = 
"/private/tmp/parquet-testing/data/backward_compat_nested.parquet".to_string();
       
       let mut data = Bytes::from(std::fs::read(file_path).unwrap());
   
       let mut options = ArrowReaderOptions::new();
       let reader_metadata = ArrowReaderMetadata::load(&mut data, 
options.clone()).unwrap();
   
       let physical_file_schema = Arc::clone(reader_metadata.schema());
   
       // Commenting this out will make the code work
       options = options
           .with_schema(Arc::clone(&physical_file_schema));
   
       ArrowReaderMetadata::try_new(Arc::clone(reader_metadata.metadata()), 
options)
               .unwrap();
   }
   ```
   
   **Expected behavior**
   Should not fail
   
   **Additional context**
   this might be a bug in DataFusion rather than parquet reader here due to 
backward compatibility the schema was updated to the new version:
   - 
https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#backward-compatibility-rules
 
   
   I've added the file in:
   - https://github.com/apache/parquet-testing/pull/96


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to