alamb commented on PR #103: URL: https://github.com/apache/datafusion-site/pull/103#issuecomment-3275651109
> This is makes sense with the filter, but to get the min value for the filter we still to full scan, that is something i'm still missing, lets go ahead, yes, thanks for explanations Let's take the best case, which is * after reading the first batch from the first file, DataFusion has read the actual minimum value While it is true DataFusion now still needs to check all remaining files to ensure this is actually the minimum value, it **may** not have to actually open and read and decode the rows in the file -- for example, it could potentially prune (skip) all remaining files using statistics. And even if it can't prune out the entire file, it may be able to prune row groups, or ranges of rows (if `pushdown_filters`) is turned on -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
