potiuk commented on pull request #17545: URL: https://github.com/apache/airflow/pull/17545#issuecomment-897436708
Once you allow custom Dag-provided code to be run during scheduling - this might be more than just checking few timestamps. By adding a code that does sleep(1000) you can break the whole scheduling process IMHO. Besides It is against the direction we go where we want to add isolation and security between components. As I understand it there should NEVER be a code coming from DAG executed during scheduling. That brakes nor only scalability and performance properties of the scheduler but also isolation and security. Currrently executing Dag-provided code happens in two places: * During DAG parsing * During task execution Maybe I am wrong but as i understand it ( and i recommend the talk from @ashb https://airflowsummit.org/sessions/2021/deep-dive-in-to-the-airflow-scheduler/) - the DAG parsing is isolated from scheduling and is executed on separate processes. In Airflow 2, scheduling has been completely decoupled from parsing and is done exclusively based on database-stored information - no DAG code is ever executed when scheduling happens. So if we want to make adecision impacting state of the Task Instance based on some code coming from DAG, the only good time to do it is at Task execution (parsing is too early). We are only going to strengthen the isolation properties of Airflow architecture, not loosen it so I do not see how this 'shortcut' can happen. But maybe I do not see something correctly ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
