adriangb opened a new issue, #13270: URL: https://github.com/apache/datafusion/issues/13270
### Describe the bug With CSV: ```shell echo "a,b\n1,2" > data1.csv mkdir a=2 echo "b\n3" > a=2/data2.csv datafusion-cli > SELECT * FROM '**/*.csv'; Arrow error: Csv error: incorrect number of fields for line 1, expected 2 got 1 ``` With Parquet: ```python import os import polars as pl pl.DataFrame({'a': [1], 'b': [2]}).write_parquet('data1.parquet') os.mkdir('a=2') pl.DataFrame({'b': [3]}).write_parquet('a=2/data2.parquet') ``` ```shell datafusion-cli > SELECT * FROM '**/*.parquet'; +---+---+ | b | a | +---+---+ | 2 | 1 | | 3 | | +---+---+ 2 row(s) fetched. Elapsed 0.055 seconds. ``` ### To Reproduce _No response_ ### Expected behavior Partition evolution is handled and both cases return ``` +---+---+ | b | a | +---+---+ | 2 | 1 | | 3 | 2 | +---+---+ ``` ### Additional context Having played around quite a bit with ParquetExec and the SchemaAdapter machinery I think what should happen is: - Partition values are on a per-file basis, in particular on each `PartitionedFile` and not on the `FileScanConfig` - Partition values are passed into the SchemaAdapter machinery and for each file it decides if it needs to add a column generated from partition values or not -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org