tustvold commented on issue #3463: URL: https://github.com/apache/datafusion/issues/3463#issuecomment-3708332419
Yeah, it's a good point that whilst caching reduces the additional decode costs for pushing down predicates, it doesn't eliminate the IO costs. That being said in general you only really want to be pushing down one or two ArrowPredicate as each has costs associated with it, with this just becoming even more true against object stores. One has 3 choices with what to do with a given predicate: 1. Push it down as-is as an ArrowPredicate 2. Fuse it with some other predicates into a combined ArrowPredicate 3. Don't push the predicate down at all If you have multiple ArrowPredicate you then have the additional complexity of what order to apply them in. I'm not familiar with how DF is handling this currently, but a selectivity estimate based approach at plan time might be a good place to start. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
