potiuk commented on pull request #17545:
URL: https://github.com/apache/airflow/pull/17545#issuecomment-897436708


   Once you allow custom Dag-provided code to be run during scheduling -  this 
might be more than just checking few timestamps. By adding a code that does 
sleep(1000) you can break the whole scheduling process IMHO.
   
   Besides It is against the direction we go where we want to add isolation and 
security between components. As I understand it there should NEVER be a code 
coming from DAG executed during scheduling. That brakes nor only scalability 
and performance properties of the scheduler but also isolation and security. 
Currrently executing Dag-provided code happens in two places:
   
   * During DAG parsing
   * During task execution 
   
   Maybe I am wrong but as i understand it ( and i recommend the talk from 
@ashb  
https://airflowsummit.org/sessions/2021/deep-dive-in-to-the-airflow-scheduler/) 
- the 
   DAG parsing is isolated from scheduling and is executed on separate 
processes.
   
   In Airflow 2, scheduling has been completely decoupled from parsing and is 
done exclusively based on database-stored information - no DAG code is ever 
executed when scheduling happens. So if we want to make adecision impacting 
state of the Task Instance based on some code coming from DAG, the only good 
time to do it is at Task execution (parsing is too early).
   
   We are only going to strengthen the isolation properties of Airflow 
architecture, not loosen it so I do not see how this 'shortcut' can happen. 
   
   But maybe I do not see something correctly ?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to