TL;DR; I am in favour of the change (limiting it to DagFileProcessor only) and turning using Variables at top-level as good practice (2.6+).
Here is why. I partially agree with Elad /Ash, but I am not 0/1 on that subject any more. I think there is a more nuanced approach that we could take here. I had a very similar view yesterday but I slept on it and changed my mind - to an extent. Putting aside (for a moment, I will come back to it further on) the fact that this one is addressing "bad practices" of users. I want to focus on technicalities first and risks. I initially contested the idea seeing the PR and raised my doubts (actually the discussion here is because I asked Raphael to bring it here): https://github.com/apache/airflow/pull/30259#issuecomment-148161094. The reasons I raised were based on earlier discussions and attempts to do similar things (including by myself) when I hit some problems related to multiprocessing and especially Celery. After looking at the code and hearing from Raphael I think the solution he came up with **might** work in a stable and good way for DagFileProcessor, where it is risky to implement for Celery (because it's billiard <> multiprocessing weird relation - Celery has it's own multiprocessing drop-in replacement - billiard that at times clashes with multiprocessing). The risk that this caching approach will cause problems when enabled for Celery is high and while I see how it can work for our DagFileProcessor, it will be a strong "no" for Celery. And As Daniel rightfully noticed, in K8S executor it is ineffective at all. However, I think (again putting aside the bad practices angle) - I **think** if we only do that for DagFileProcessor (I think caching in workers is generally out of question), from the technical point of view it has only benefits. * the amount of traffic it generates is WAY smaller in general - I think there is a nice angle to it that in this way it can decrease the amount of traffic generated even across multiple DAGs - imagine 100 DAGs using the same variable - with single Scheduler/DagFileProcessor that might be a really nice resource saving (not only Secrets but just DB traffic and in some cases even number of connections to establish to the DB. * this will also - interestingly - help in case of AIP-44 and db-less dag file processor (in terms of lower number of requests to the internal-API * I think in case of the DagFileProcessor, my initial worry about ttl (and sometimes seeing outdated version of the variable) is far, far less than in case of workers. Unlike for tasks /workers, DAGs are parsed in essentially random intervals and there is never a "dependency" between DAGs so whether the variable is fresh or stale a bit - does not matter (because the same DAG can be parsed now or 5 minutes ago and we have essentially no control on it and we cannot rely on it)(and. So - technically speaking, to summarize, as long as we test it thoroughly - also at scale, I see no problem with getting it in - as long as we only limit it to DagFileProcessor. Now - coming back to the reason "should we make life easier for our users who violate best practices" (essentially helping them to violate those). I think in general we should not make life easier for users who violate best practices. And I agree with Elad's points. And as a side comment I love Ash's idea about not re-parsing - this should be a discussion started separately and I would be 100% supportive for that one. It's the first time I hear it from Ash :) and I love it - there are other things to consider there - like when to trigger reparsing etc/ However, I think with this change we have the opportunity of doing something different - i.e. change our best practices. Why not changing our best practices to (essentially): "Do not use long running things like API calls. You can use Airflow Variables though - they are implemented in the way that workaround the limitations, at the expense of propagation time it takes when you change the variable" (or something like that). Philosophical rant here (sorry for those who do not like such analogies): The nice thing is that for many of our users (who currently violate them) they will suddenly with **just** upgrading Airflow fall from "violating best practices" camp to "following best practices" camp. I think that could be a major win for everyone. Yes, we can preach the best practices but seeing how often they are violated (for good reason, alternatives are not as "easy" to use), another good approach is to embrace the behaviours and let them become good practices. That worked for major religions for example (we are close to it - so think Easter being absorbed from Pagan spring equinox celebrations originally). It's a very effective strategy to bring new followers and strengthen binding with those who struggle with their practices being touted as "bad". And this all without making a single change to their DAG files/structure (this would be necessary if we follow the "no-reparsing" idea by Ash). It would also make migration to 2.6 way more attractive for many users (if it becomes part of 2.6). Think our message to our users "yes you pay a lot for Secret access, but by migrating to 2.6 your bills will drop-down dramatically" with just upgrading and no changes to your DAGs. In the current economic climate, that alone might convince some decision makers to invest in the migration to 2.6. This is not a super-strong opinion, I would also be ok if we leave things as they are, but I think that there are no technical reasons why this would not work. Those are more "communication" issues - how we communicate to the users, how to be VERY CLEAR that "using variables at top level" is only good for 2.6+ and that it is a very bad idea for pre-2.6. I think (despite many years of arguing and advocating - including my own statements/words/rants - differently) we should keep an open mind to that option. Thoughts? J On Fri, Mar 24, 2023 at 2:42 AM Daniel Standish <daniel.stand...@astronomer.io.invalid> wrote: > > It would not help with kubernetes executor. Did you try with local > executor? > > Another option to consider is to implement this specifically on the AWS > secrets manager secrets backend itself. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org For additional commands, e-mail: dev-h...@airflow.apache.org