adriangb commented on issue #16841: URL: https://github.com/apache/datafusion/issues/16841#issuecomment-3357203060
> This is essentially `datafusion.execution.target_partitions`, right? Yes exactly. The issue is that say you set target partitions to 8 because you have 8 CPU cores. For a very selective query that is going to return 1 row but have to prune hundreds of files / row groups (using parquet stats, predicate pushdown, etc.) the bottleneck is going to be IO / latency to object storage. Setting target partitions to say 128 will make the query go *a lot* faster. But if you set that permanently queries that are CPU bound (e.g. same query but with no filter) will be much slower or OOM your machine. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
