adriangb commented on issue #15780: URL: https://github.com/apache/datafusion/issues/15780#issuecomment-2902428819
> > "casting should not be changed after planning" > > if i said exactly that I should stand corrected. _coercions_ is somewhat that should be applied during analysis/initial planning phase. Coercion rules result in casts being inserted into the plan. After the initial plan is fully formed, the word "coercion" does not exist anymore. > > The casts are same category as function calls -- the optimizer may reorganize or replace function calls with other expressions as long as they are _equivalent_ (and are believed to "be better"). Casts can be removed or replaced the same way (again: as long as the resulting expression is well formed and equivalent). Thanks for correcting me! That's the sort of distinction I knew you'd be able to make that I was lacking. It's a helpful way to think about it > from the issue description: > > > So when the filter gets into ParquetSource it's an Int32 filter. But when we read the file schema it's actually an Int8! > > Where does Int8 come back? > Anyway, as the example shows, two different files may have two different internal representation for the same SQL-level column. I.e. the table may declare Int64, but the file may contain Int32 or Int16. (This is not limited to various Int bitnesses). The Parquet source which deals with individual files may perform similar logic to `unwrap_cast` optimizer. Does it matter though? That's the point: we need to do similar logic to `unwrap_cast`, which is non trivial I think. I started down that road and got confused about some cases so I backed out. For example, if you have a table schema of `col1: Int64, col2: Int64` and a predicate `col1 = col2` there will be no casts at the logical level. But when you get to the file level you have the schema `col1: Int8, col2: UInt32`, now you have to do something more similar to coercion I think (i.e. introduce some casts)? What would you suggest we do in this case? Basically I think we need to all agree that this complexity is the right way to go and then agree on what to do in the different scenarios. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org