Samyak2 opened a new issue, #22016:
URL: https://github.com/apache/datafusion/issues/22016

   ### Describe the bug
   
   - `CoalescePartitionsExec` tasks hold on to an Arc ref of the input plan 
([ref](https://github.com/apache/datafusion/blob/c134a848289cc465b0c4c3dc8be076cc10e299db/datafusion/physical-plan/src/stream.rs#L335))
     - It doesn't actually need it after `input.execute(..)` except for 
printing the plan in debug logs
   - `RepartitionExec` has some state in the plan itself
     - One such state holds on to Arc refs of `Arc<Vec<SpawnedTask<()>>>` 
([ref](https://github.com/apache/datafusion/blob/c134a848289cc465b0c4c3dc8be076cc10e299db/datafusion/physical-plan/src/repartition/mod.rs#L195))
     - Which means that the tasks are only cancelled once all Arc refs to the 
plan are dropped
     - So each layer of `CoalescePartitionsExec`-`RepartitionExec` delays 
cancellation of the query
   
   ### To Reproduce
   
   I have a reproducer here: https://github.com/Samyak2/datafusion/pull/1 
(warning: mostly LLM-generated, but I have verified that it actually checks the 
correct thing)
   
   Relevant parts of the output:
   ```
   repartition_task_group=0 input_partition=0 kind=pull_from_input 
drop_elapsed_ms=68
   repartition_task_group=1 input_partition=0 kind=pull_from_input 
drop_elapsed_ms=80
   repartition_task_group=1 input_partition=1 kind=pull_from_input 
drop_elapsed_ms=85
   output_partitions=32 input_rows_per_partition=1024000 
all_repartition_operator_drop_elapsed_ms=80
   all_repartition_task_drop_elapsed_ms=85
   all_observed_drop_elapsed_ms=85
   ```
   
   The cancellation is delayed by ~80ms due to `CoalescePartitionsExec`
   
   ### Expected behavior
   
   `CoalescePartitionsExec` should drop child plan early
   
   ### Additional context
   
   I have a fix for this. Will raise a PR soon.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to