Hey,

This is to give a heads up that I am planning to merge #1514, the refactor of 
process_dag, today. This is the second step in executing on the scheduler 
roadmap. It has been running in our production for a week now with no 
functional differences. Scheduler loop times start a bit higher, but have a 
lower max. Amount of connections to the database is round 1/3 of the previous 
scheduler (test dag went from 150 connections to 50). Database load slightly 
lower.

While fixing many issues (race conditions), a corner case mentioned by Jeremiah 
is now present. A TI is sent in SCHEDULED state to the executor. The executor 
fails in loading the TI then the TI might be orphaned forever. As fixing the 
corner case will require further fundamental changes we discussed it should be 
addressed in a follow up patch.

My planned next steps are 1) reduce scheduler loop time to around 1s by making 
task reporting “event driven”. 2) auto-align start date 3) add notion of 
“previous” to dagrun 4) fix corner case mentioned above.

- Bolke

 

Reply via email to