ashb commented on issue #5615: [AIRFLOW-5035] Remove multiprocessing.Manager 
in-favour of Pipes
URL: https://github.com/apache/airflow/pull/5615#issuecomment-515565720
 
 
   I haven't got nice graphs, but the "scheduler overhead" (which I've defined 
as the time from dag run start to dag run end, minus time spent in executors) 
seems relatively stable an small.
   
   Both these are the first 5 dags activated. (not all 20). This isn't conclu
   
   **1.10.4rc3**: 09.022235 (stddev 2.731696s, 41runs)
   **With this branch**: 8.931782 (3.937999s, 30 dag runs)
   
   This is not the most exhaustive benchmark, but indicative for 
light-to-medium loads it doesn't affect things very much.
   
   This is the query I used to the data:
   
   
   ```sql
   WITH
     summary as (SELECT dag_run.dag_id,
       dag_run.execution_date,
       dag_run.state,
       dag_run.end_date - dag_run.start_date AS duration,
       dag_run.start_date - (dag_run.execution_date + interval '10 minutes') AS 
schedule_delay,
       max(task_instance.end_date) - min(task_instance.start_date) AS 
total_ti_exec_time,
       avg(task_instance.start_date - task_instance.queued_dttm) AS 
avg_queued_time
       FROM dag_run
       JOIN task_instance
       USING (dag_id, execution_date)
       GROUP BY  dag_run.dag_id, dag_run.execution_date, dag_run.state, 
dag_run.end_date, dag_run.start_date
       ORDER BY  execution_date),
     data AS (SELECT *, duration-total_ti_exec_time AS scheduler_overhead  FROM 
summary)
   SELECT avg(scheduler_overhead), (stddev(extract ('epoch' from 
scheduler_overhead)) || ' seconds')::interval as stddev, count(*) as num_runs 
FROM data
   ```
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to