alamb commented on PR #17553: URL: https://github.com/apache/datafusion/pull/17553#issuecomment-3347244624
I tried the reproducer from https://github.com/apache/datafusion/issues/17516 and it still fails on this PR: Maybe I don't understand how to use it 🤔 ```sql > create external table foo stored as csv location '/Users/andrewlamb/Downloads/services' options ('truncated_rows' true); 0 row(s) fetched. Elapsed 0.021 seconds. > select * from foo limit 10; Arrow error: Csv error: incorrect number of fields for line 1, expected 17 got 20 ``` It also errors if I just try to read the directory directly: ```sql (venv) andrewlamb@Andrews-MacBook-Pro-3:~/Software/datafusion$ cargo run --bin datafusion-cli Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.20s Running `target/debug/datafusion-cli` DataFusion CLI v50.0.0 > select * from '/Users/andrewlamb/Downloads/services' limit 10; Arrow error: Csv error: incorrect number of fields for line 1, expected 17 got 20 ``` This PR seems like a step in the right direction to me, it just doesn't seem to fix the problem It sounds like (as follow on issues / PRs) we probably would need to: 1. Enable schema merging for CSV by default ( 2. Implement schema merge using column names (not positions) which is how parquet works, and I think what users would expect. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
