findepi commented on issue #15780: URL: https://github.com/apache/datafusion/issues/15780#issuecomment-2902451938
It's very natural to think about file-level vs table level as same thing as SQL coercions, but there is an important distinction. SQL has its own semantics and table provider has its own semantics. Making this distinction is easier to understand in systems where it's not Arrow everywhere and SQL side and table provider side are cleanly delimitated by their different type systems. From Parquet to table level -- the semantic of this operation is defined by a read. What happens if file has `col1: Int8` but the table defines it as `Int32`? Well, nothing unusual, Int8 is extended to Int32 (infallibly). There is "a cast" (an equivalent of SQL cast) happening inside the table provider. If a query comes with a filter (in Int32 terms), the filter _may_ be translated to `col1` by equivalent of the unwrap cast optimization (yes, separate code). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org