ashb commented on PR #30259:
URL: https://github.com/apache/airflow/pull/30259#issuecomment-1649614970

   >I'm creating this cache enabled by default, and with a default TTL of 15 
minute
   
   Nope, I'm vetoing that specific combo. Enabling by default with a 15 minute 
timeout is going to wreak havok with any kind of DAG that uses variables at the 
top level - both in a multi scheduler behavoiur (where the two parsers would 
constantly "fight" and flip-flop the DAG structure back and forth) and not to 
mention that the worker itself is going to not use this cache so will very 
easily result in a task not existing when the worker comes to execute the task.
   
   And that worker situation is exaserbated by the fact that the logs/failure 
mode when the task doesn't exist anymore has really poor logging, which I think 
manifests as the task failing with no logs viewable in the UI at all.
   
   So no, we can't have this on by default, and 15mins is waay to long by 
default. This is going to break too things in hard-to-debug, and perhaps even 
notice. ways.
   
   I'll only accept off by default.
   
   > * In addition, as far as I know, if an Airflow customer utilizes a 
standalone DAG processor (AIP-43), the issues of multiple schedulers causing 
conflict and increasing DB load are entirely eliminated.
   
   You've described effectively a breaking change: what used to work before now 
needs the cluster operator (in many larger companies this is a different team 
to the DAG authors) to make a change. This idea doesn't work for me. Nor does 
it help if you have so many dags/dag files that you need multiple parsers. 
(Yes, I know about the possible to have a parser per subdir.)
   
   In short, nothing has changed since our previous discussion to my mind.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to