Re: [PR] Per file filter evaluation [datafusion]

via GitHub Sun, 13 Apr 2025 20:25:49 -0700


adriangb commented on PR #15057:
URL: https://github.com/apache/datafusion/pull/15057#issuecomment-2800379042


   > Another question is, isn't the filter created based on table schema? And 
then the batch is read as file schema and casted to table schema and is 
evaluated by filter.
   
   Yes this is exactly the case.
   
   What we could do is rewrite the filter based on file schema. Assume we have 
`cast(a, i64) = 100`, `a` is i32 in table schema and i64 in file schema. We 
rewrite it to `cast(cast(a,i32),i64) = 100` and then optimize it with `a = 100`.
   
   Yes that is exactly what I am proposing above, what method it happens by is 
not that important to me.
   
   The other point is if we can use this same mechanism to handle shredding for 
the variant type. In other words, can we "optimize" `variant_get(col, 'key')` 
to `col.typed_value.key` if we know from the file schema that `key` is shredded 
for this specific file.
   
   And if that all makes sense... how do we do those optimizations? Is it 
something like an optimizer that has to downcast match on the expressions, or 
do we add methods to PhysicalExpr for each expression to describe how it 
handles this behavior?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] Per file filter evaluation [datafusion]

Reply via email to