ashb commented on PR #30259: URL: https://github.com/apache/airflow/pull/30259#issuecomment-1649614970
>I'm creating this cache enabled by default, and with a default TTL of 15 minute Nope, I'm vetoing that specific combo. Enabling by default with a 15 minute timeout is going to wreak havok with any kind of DAG that uses variables at the top level - both in a multi scheduler behavoiur (where the two parsers would constantly "fight" and flip-flop the DAG structure back and forth) and not to mention that the worker itself is going to not use this cache so will very easily result in a task not existing when the worker comes to execute the task. And that worker situation is exaserbated by the fact that the logs/failure mode when the task doesn't exist anymore has really poor logging, which I think manifests as the task failing with no logs viewable in the UI at all. So no, we can't have this on by default, and 15mins is waay to long by default. This is going to break too things in hard-to-debug, and perhaps even notice. ways. I'll only accept off by default. > * In addition, as far as I know, if an Airflow customer utilizes a standalone DAG processor (AIP-43), the issues of multiple schedulers causing conflict and increasing DB load are entirely eliminated. You've described effectively a breaking change: what used to work before now needs the cluster operator (in many larger companies this is a different team to the DAG authors) to make a change. This idea doesn't work for me. Nor does it help if you have so many dags/dag files that you need multiple parsers. (Yes, I know about the possible to have a parser per subdir.) In short, nothing has changed since our previous discussion to my mind. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
