nantunes opened a new issue, #15839: URL: https://github.com/apache/datafusion/issues/15839
### Describe the bug When querying an Avro file table in DataFusion, column selection works fine when columns are in schema order or a subset of columns in any order. However, if the column order in the SELECT statement differs from the original schema order, it results in a type mismatch error. This happens because the current Avro reader implementation doesn't properly respect the ordering of columns specified in the projection when creating the RecordBatch. The reader creates arrays correctly but doesn't match them with the expected schema ordering. ### To Reproduce 1. Create an Avro file with multiple columns of different types (e.g., username: string, tweet: string, timestamp: int64) 2. Register it as a table in DataFusion 3. Try different query patterns: ``` // This works (all columns in original order) SELECT * FROM avro_file1 +------------+-------------------------------------+------------+ | username | tweet | timestamp | +------------+-------------------------------------+------------+ | miguno | Rock: Nerf paper, scissors is fine. | 1366150681 | | BlizzardCS | Works as intended. Terran is IMBA. | 1366154481 | +------------+-------------------------------------+------------+ // This works (subset of columns in original order) SELECT username, timestamp FROM avro_file1 +------------+------------+ | username | timestamp | +------------+------------+ | miguno | 1366150681 | | BlizzardCS | 1366154481 | +------------+------------+ // This fails (reordered columns) SELECT timestamp, username FROM avro_file1 ❌ column types must match schema types, expected Int64 but found Utf8 at column index 0 ``` ### Expected behavior All three queries should work correctly. The third query should return the columns in the order specified in the SELECT statement: ``` +------------+------------+ | timestamp | username | +------------+------------+ | 1366150681 | miguno | | 1366154481 | BlizzardCS | +------------+------------+ ``` ### Additional context The issue is in the Avro reader implementation, specifically in how it handles projections. When columns are reordered in the query, the reader creates arrays in the original schema order but the output schema expects them in the reordered sequence, leading to a type mismatch. This issue only affects the Avro reader - other formats like Parquet and CSV seem to handle column reordering correctly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org