andrei-ionescu commented on pull request #1392:
URL: https://github.com/apache/arrow-datafusion/pull/1392#issuecomment-985775205
@houqp After more debugging and fixing different things I found that the
physical plan lacks the nested fields support.
I got into this error:
```
Error: ArrowError(SchemaError("Unexpected batch schema from file, expected
36 cols but got 6"))
```
And this error is happening in these lines of code:
[physical_plan/file_format/mod.rs#L223-L229](https://github.com/apache/arrow-datafusion/blob/master/datafusion/src/physical_plan/file_format/mod.rs#L223-L229).
The chunk of data that has been read has only 6 columns while the expected
number of columns is 36.
The root cause seems to be the way parquet files are read vs how it gets
projected. It reads one top nested column at a time, while it tries to project
that chunk of data over the full schema. For example, in the case of the
`nested_struct.rust.parquet` it reads the first column with 6 leaves and then
tries to project that over all 36 top columns of that parquet file. This is
root cause of the error above.
It seems that DataFusion lacks the support for nested fields, at least when
using the parquet data source.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]