alex opened a new issue, #12852: URL: https://github.com/apache/datafusion/issues/12852
### Describe the bug When working with a CSV file that has duplicate column headers (e.g., https://github.com/openssl/openssl/blob/master/test/recipes/80-test_cmp_http_data/test_connection.csv) the behavior is confusing. Specifically, any query against a table backed by that file produces the error: > Arrow error: Csv error: incorrect number of fields for line 1, expected 14 got more than 14 However, all rows in that CSV have 20 fields. Based on looking at the results of a `limit 0` query, I can see that the schema is effectively dropping all duplicate columns from the expected schema, and therefore the following rows do not have the expected number of cells. ### To Reproduce Run queries against the linked CSV file. ### Expected behavior I believe expected behavior would be to either a) automatically rename those columns (adding `_{n}` perhaps), or b) provide a clear error that schemas with duplicate column names are not supported. ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org