Cheappie commented on issue #2179:
URL: 
https://github.com/apache/arrow-datafusion/issues/2179#issuecomment-1103159840

   @alamb well in my case It is impossible to run query because 
datafusion(ArrowReader) fails to load parquet file that was serialized via 
ArrowWriter. 
   
   From what we can see in error below, somehow schema is missing field. But 
actually ArrowReader reads schema correctly. Just a bit later one field is lost 
somewhere or maybe struct is incorrectly interpreted somewhere in datafusion.
   ```
   expected: Struct([Field { name: \"most\"}, Field { name: \"least\"}]) 
   but found: Struct([Field { name: \"most\"]) at column index 0")
   ```
   
   Whats even more interesting using same ArrowReader(ParquetFileArrowReader) 
as datafusion uses internally, I was able to read this parquet file without 
issues and access both columns of struct using below snippet.
   ```
       let rd = SerializedFileReader::new(file).expect("...");
       let mut pqrd = ParquetFileArrowReader::new(Arc::new(rd));
   
       for batch in pqrd.get_record_reader(60).expect("...") {
           let batch = batch.unwrap();
           let col = batch.column(0);
           let child_data = col.data().child_data();
           println!("{:?}", child_data[0].buffers().get(0));
           println!("{:?}", child_data[1].buffers().get(0));
       }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to