Re: [PR] feat: Support reading CSV files with inconsistent column counts [datafusion]

via GitHub Thu, 02 Oct 2025 20:05:22 -0700


Jefffrey commented on PR #17553:
URL: https://github.com/apache/datafusion/pull/17553#issuecomment-3364010908


   I think the problem is the infer schema code runs for a certain number of 
input rows (configurable via `schema_infer_max_rec`); so if the schema changes 
across these rows, it gets picked up by this fixed code correctly. However if 
the schema changes for rows after this limit then we hit the error that @alamb 
encounters; this makes sense for the DuckDB example in the issue as the dataset 
is quite massive and it seems the later files change their number of columns, 
so our default schema inference (1000 rows) doesn't pick up this schema change.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat: Support reading CSV files with inconsistent column counts [datafusion]

Reply via email to