jayzhan211 commented on PR #15057: URL: https://github.com/apache/datafusion/pull/15057#issuecomment-2800373423
> PhysicalExpr::with_schema This method is too general and it is unclear what we need to do with the provided schema for each PhysicalExpr, it is not a good idea. > I suspect the hard bit with this approach will be edge cases: what if a filter cannot adapt itself to the file schema, but we could cast the column to make it work? I'm thinking something like a UDF that only accepts Utf8 but the the file produces Utf8View I think it is unavoidable we need to cast the columns to be able to evaluate the filter. Another question is, isn't the filter created based on table schema? And then the batch is read as file schema and casted to table schema and is evaluated by filter. What we could do is rewrite the filter based on file schema. Assume we have `cast(a, i64) = 100`, `a` is i32 in table schema and i64 in file schema. We rewrite it to `cast(cast(a,i32),i64) = 100` and then optimize it with `a = 100`. In your example where udf only accepts utf8, we know that no optimization we could do so we just end up additional casting from file schema to table schema. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org