Re: [I] [DISCUSSION] Memory accounting model discussion [datafusion]

via GitHub Wed, 01 Oct 2025 09:32:04 -0700


adriangb commented on issue #16841:
URL: https://github.com/apache/datafusion/issues/16841#issuecomment-3357203060


   > This is essentially `datafusion.execution.target_partitions`, right?
   
   Yes exactly.
   
   The issue is that say you set target partitions to 8 because you have 8 CPU 
cores. For a very selective query that is going to return 1 row but have to 
prune hundreds of files / row groups (using parquet stats, predicate pushdown, 
etc.) the bottleneck is going to be IO / latency to object storage. Setting 
target partitions to say 128 will make the query go *a lot* faster. But if you 
set that permanently queries that are CPU bound (e.g. same query but with no 
filter) will be much slower or OOM your machine.
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] [DISCUSSION] Memory accounting model discussion [datafusion]

Reply via email to