eladkal commented on PR #30259: URL: https://github.com/apache/airflow/pull/30259#issuecomment-1520857441
> The cluster admin wants to be able to detect "bad" DAGS by monitoring parsing time, they already do it, and they help the DAG authors in fixing it. This cluster admin would not enable this cache because their solution is to attack the problem at the root. They already have their own solution, this cache is not a new problem nor a new solution to them. DAG parsing time by itself is not enough to isolate the Variable.get problem. For example heavy import may increase parsing time significantly. The added option of turn on/off the feature is not really an option. Airlfow will advocate for it, solution architects will recommend it, posts will be written on it that will lead to forcing cluster admins hands and most organizations will adopt this setting. > The cluster admin has on their hands plenty of DAGs that take a long time to parse, they are not in the business of educating the DAG authors (for whatever reason), and they'd be happy to have a flip to switch that would lower parsing time, network traffic and possibly cloud provider costs. They had a big problem before, now they have a smaller problem. I will argue that for such users we should recommend NOT to use secret manager at all. > It should not just be on variables, but connections too. Take the following example: I use Snowflake in my data lake. I store the credentials in Secrets Manager. I have 2,000 tables that update hourly from Airflow, each with a Snowflake operator. That is 24 X 2,000 X 30 = 1.44 million monthly Secrets Manager calls, at[ $0.05/10,000 calls](https://aws.amazon.com/secrets-manager/pricing/) that's an extra monthly charge of $144 to pull the same connection over and over again. And that's with only 2,000 tasks per hour--lots of users have far greater usage than that. Why? Airflow retrieve the connection upon usage not in parsing. The issue is with Variable.get() where users use it as top level code. Please explain how this bill is shown for users who follow DAG authoring practices. > A bit of code to cache that data will reduce customer cost and improve performance. But this is not the case. This feature does hurt some of Airflow users as I explained. We are "helping" one user and causing pain for another. ----------------------------------------- @john-jac @vandonr-amz is your stand that this change does only good and no one will be hurt by this despite the examples I shared? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
