Rafferty97 commented on issue #12852: URL: https://github.com/apache/datafusion/issues/12852#issuecomment-2469474930
Having thought about it some more, I think the use of `Schema::try_merge` is actually incorrect for CSV files, because the CSV reading process assumes that the fields in the `Schema` are in the same order as they appear in the file. So, if two CSVs are read in with the same columns but out of order, this will cause data to appear in the wrong columns. This might error out if there is a mismatch in types, but could also just silently return bogus data. My intuition is that the code needs to be changed to merge CSV schemas based on field index not field name. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
