eladkal commented on PR #30259:
URL: https://github.com/apache/airflow/pull/30259#issuecomment-1520857441

   > The cluster admin wants to be able to detect "bad" DAGS by monitoring 
parsing time, they already do it, and they help the DAG authors in fixing it. 
This cluster admin would not enable this cache because their solution is to 
attack the problem at the root. They already have their own solution, this 
cache is not a new problem nor a new solution to them.
   
   DAG parsing time by itself is not enough to isolate the Variable.get 
problem. For example heavy import may increase parsing time significantly. The 
added option of turn on/off the feature is not really an option. Airlfow will 
advocate for it, solution architects will recommend it, posts will be written 
on it that will lead to forcing cluster admins hands and most organizations 
will adopt this setting.
   
   > The cluster admin has on their hands plenty of DAGs that take a long time 
to parse, they are not in the business of educating the DAG authors (for 
whatever reason), and they'd be happy to have a flip to switch that would lower 
parsing time, network traffic and possibly cloud provider costs. They had a big 
problem before, now they have a smaller problem.
   
   I will argue that for such users we should recommend NOT to use secret 
manager at all.
   
   
   > It should not just be on variables, but connections too. Take the 
following example: I use Snowflake in my data lake. I store the credentials in 
Secrets Manager. I have 2,000 tables that update hourly from Airflow, each with 
a Snowflake operator. That is 24 X 2,000 X 30 = 1.44 million monthly Secrets 
Manager calls, at[ $0.05/10,000 
calls](https://aws.amazon.com/secrets-manager/pricing/) that's an extra monthly 
charge of $144 to pull the same connection over and over again. And that's with 
only 2,000 tasks per hour--lots of users have far greater usage than that.
   
   Why? Airflow retrieve the connection upon usage not in parsing.
   The issue is with Variable.get() where users use it as top level code.
   Please explain how this bill is shown for users who follow DAG authoring 
practices.
   
   > A bit of code to cache that data will reduce customer cost and improve 
performance.
   
   But this is not the case. This feature does hurt some of Airflow users as I 
explained.
   We are "helping" one user and causing pain for another.
   
   -----------------------------------------
   
   @john-jac @vandonr-amz is your stand that this change does only good and no 
one will be hurt by this despite the examples I shared?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to