tirkarthi commented on PR #38434:
URL: https://github.com/apache/airflow/pull/38434#issuecomment-2016741776

   One another thing that I noticed was that the "mean run duration" was 
actually "mean total duration". Since the bars are stacked the value used to 
calculate mean was actually "queued duration + run duration" which I mistook as 
run duration since most of my task instances had less than few seconds of 
queued seconds and I assumed it was mean run duration. Below is an example of 
the the change in value. This is more visible when tasks take more time in 
queued state than actual execution where mean run duration markline will be 
below mean queued duration markline. But using total run means mean total run 
will be above queued duration markline.
   
   To have median only for run duration we have to add "valueDim": 2 in the 
second markline. Is "mean total" more useful than "mean run"?
   
   Example with mean queued, run and total plotted.
   
   ```python
   >>> import statistics
   >>> queued
   [1.75, 1.72, 1.9, 1.58, 1.81]
   >>> run
   [28.18, 20.2, 16.16, 1.19, 22.19]
   >>> statistics.median(run) # mean run duration
   20.2
   >>> statistics.median([q + r for q, r in zip(queued, run)]) # mean total 
duration
   21.919999999999998
   ```
   
   
![image](https://github.com/apache/airflow/assets/3972343/2d57b431-e607-441b-8772-5ae3a26e1c4f)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to