adriangb commented on issue #3463: URL: https://github.com/apache/datafusion/issues/3463#issuecomment-3708272050
> The new parquet pushdown sort of does this IIUC, but at the physical execution level - i.e. after the IO strategy is somewhat baked in AFAIK the only thing along these lines we have is the re-ordering of filters based on the sum of column sizes for the columns referenced by each filter. Filter pushdown is all or nothing, there is no per-filter selectivity analysis. And I don't know how we could do it at plan time, statistics aren't generally enough to estimate filter selectivity (unless there is an overhaul in the statistics system, which is [might just happen](https://github.com/apache/datafusion/issues/19487)). What I am proposing is to do that dynamically, at runtime. After each filter for each `RecordBatch` is evaluated we re-order them and possibly toss the ones with poor selectivity back into the scan phase. This would require some API changes on the arrow-rs side e.g. to track selectivity metrics but I don't think arrow-rs has to coordinate the cross-scan metrics, etc. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
