Jefffrey commented on PR #17553: URL: https://github.com/apache/datafusion/pull/17553#issuecomment-3364010908
I think the problem is the infer schema code runs for a certain number of input rows (configurable via `schema_infer_max_rec`); so if the schema changes across these rows, it gets picked up by this fixed code correctly. However if the schema changes for rows after this limit then we hit the error that @alamb encounters; this makes sense for the DuckDB example in the issue as the dataset is quite massive and it seems the later files change their number of columns, so our default schema inference (1000 rows) doesn't pick up this schema change. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
