john-jac commented on PR #30259:
URL: https://github.com/apache/airflow/pull/30259#issuecomment-1520865820

   > > It should not just be on variables, but connections too. Take the 
following example: I use Snowflake in my data lake. I store the credentials in 
Secrets Manager. I have 2,000 tables that update hourly from Airflow, each with 
a Snowflake operator. That is 24 X 2,000 X 30 = 1.44 million monthly Secrets 
Manager calls, at[ $0.05/10,000 
calls](https://aws.amazon.com/secrets-manager/pricing/) that's an extra monthly 
charge of $144 to pull the same connection over and over again. And that's with 
only 2,000 tasks per hour--lots of users have far greater usage than that.
   > 
   > Why? Airflow retrieve the connection upon usage not in parsing. The issue 
is with Variable.get() where users use it as top level code. Please explain how 
this bill is shown for users who follow DAG authoring practices.
   
   Before a task can execute, does it not both have to parse the DAG and 
retrieve the connection details?  And if a secrets backend is specified, don't 
those requests lead to Secrets Manager calls each time?  If so, then if the 
Airflow worker could also cache connections and variables then it could lead to 
a dramatic reduction in cost.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to