adriangb commented on PR #15057: URL: https://github.com/apache/datafusion/pull/15057#issuecomment-2800379042
> Another question is, isn't the filter created based on table schema? And then the batch is read as file schema and casted to table schema and is evaluated by filter. Yes this is exactly the case. What we could do is rewrite the filter based on file schema. Assume we have `cast(a, i64) = 100`, `a` is i32 in table schema and i64 in file schema. We rewrite it to `cast(cast(a,i32),i64) = 100` and then optimize it with `a = 100`. Yes that is exactly what I am proposing above, what method it happens by is not that important to me. The other point is if we can use this same mechanism to handle shredding for the variant type. In other words, can we "optimize" `variant_get(col, 'key')` to `col.typed_value.key` if we know from the file schema that `key` is shredded for this specific file. And if that all makes sense... how do we do those optimizations? Is it something like an optimizer that has to downcast match on the expressions, or do we add methods to PhysicalExpr for each expression to describe how it handles this behavior? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org