pepijnve commented on issue #16490:
URL: https://github.com/apache/datafusion/issues/16490#issuecomment-2992829388

   I don't think I'm legally allowed to copy/paste from that guide, but I think 
I can paraphrase.
   
   Things might actually be pretty reasonable as is today. In a sense, every 
Tokio task 'tick' is a discrete chunk of work. There are actually lots of tasks 
in flight at the same time when a query is running, definitely more than the 
number unless the query is quite simple. This might be sufficient already. The 
yielding work that was done may actually help in this regard, since it shortens 
the length of each 'tick'. This ends up creating more schedulable chunks of 
work.
   
   Making the partitioning strategies more load aware somehow might still help 
though to try to avoid congestion in partitions that for whatever reason are 
going a bit slower than the others.
   I will say that I have no understanding yet at this point how the hash based 
partitioning percolates through the pipeline. Is that something that's 
essentially a local decision for the repartition operator or does that have 
consequences further down the line in parent operators as well?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to