Cheappie commented on issue #2179:
URL:
https://github.com/apache/arrow-datafusion/issues/2179#issuecomment-1103159840
@alamb well in my case It is impossible to run query because
datafusion(ArrowReader) fails to load parquet file that was serialized via
ArrowWriter.
From what we can see in error below, somehow schema is missing field. But
actually ArrowReader reads schema correctly. Just a bit later one field is lost
somewhere or maybe struct is incorrectly interpreted somewhere in datafusion.
```
expected: Struct([Field { name: \"most\"}, Field { name: \"least\"}])
but found: Struct([Field { name: \"most\"]) at column index 0")
```
Whats even more interesting using same ArrowReader(ParquetFileArrowReader)
as datafusion uses internally, I was able to read this parquet file without
issues and access both columns of struct using below snippet.
```
let rd = SerializedFileReader::new(file).expect("...");
let mut pqrd = ParquetFileArrowReader::new(Arc::new(rd));
for batch in pqrd.get_record_reader(60).expect("...") {
let batch = batch.unwrap();
let col = batch.column(0);
let child_data = col.data().child_data();
println!("{:?}", child_data[0].buffers().get(0));
println!("{:?}", child_data[1].buffers().get(0));
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]