alamb commented on PR #103:
URL: https://github.com/apache/datafusion-site/pull/103#issuecomment-3275651109

   > This is makes sense with the filter, but to get the min value for the 
filter we still to full scan, that is something i'm still missing, lets go 
ahead, yes, thanks for explanations
   
   Let's take the best case, which is 
   * after reading the first batch from the first file, DataFusion has read the 
actual minimum value
   
   While it is true DataFusion now still needs to check all remaining files to 
ensure this is actually the minimum value, it **may** not have to actually open 
and read and decode the rows in the file -- for example, it could potentially 
prune (skip) all remaining files using statistics. And even if it can't prune 
out the entire file, it may be able to prune row groups, or ranges of rows (if 
`pushdown_filters`) is turned on


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to