Rachelint commented on issue #11451:
URL: https://github.com/apache/datafusion/issues/11451#issuecomment-2227415311

   Seems to be a good supplement to the `coop_budget` in `tokio`!
   
   Actually, we are encountering the tail latencies problem in our production, 
the heavy queries block the scheduler and make the light ones timeout... This 
feature maybe help much.
   
   But I still don't quite understand about `I think this is probably not a big 
issue if you are setting the partition parallelism to the number` mentioned 
above.
   
   Assume a machine with 8 cores, and we set the parallelism to 8.
   The query is:
   ```sql
   SELECT `c1`, COUNT(*)  FROM `test` WHERE `time` >= '2024-07-12 16:51:24' AND 
`time` < '2024-07-12 17:51:24'  GROUP BY `c1`
   ```
   will be translated to the physical plan like:
   ```
     AggregateExec: mode=FinalPartitioned, gby=[c1], aggr=[COUNT(UInt8(1))]
       CoalesceBatchesExec: target_batch_size=8192
         RepartitionExec: partitioning=Hash(c1, 8), input_partitions=8
           AggregateExec: mode=Partial, gby=[c1], aggr=[COUNT(UInt8(1))]
             ProjectionExec: expr=[c1]
               TableScan
   ```
   It looks like this scenario could occur?
   I split the physical plan above to two stages, and assume that `first stage` 
is `io bound`, and the `second stage` is `cpu bound` which will block the tokio 
scheduler(just a simple assume, may not entirely reflect reality).
   - The first stage
   ```
     AggregateExec: mode=Partial, gby=[c1], aggr=[COUNT(UInt8(1))]
       ProjectionExec: expr=[c1]
         TableScan
   ```
   - The second stage
   ```
     AggregateExec: mode=FinalPartitioned, gby=[c1], aggr=[COUNT(UInt8(1))]
       CoalesceBatchesExec: target_batch_size=8192
         RepartitionExec: partitioning=Hash(c1, 8), input_partitions=8
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to