john-jac commented on PR #30259:
URL: https://github.com/apache/airflow/pull/30259#issuecomment-1520841606

   Hi Folks...I'd like to weigh in here.
   
   Airflow treats all secrets backends the same, and runs every single 
connection, variable, and configuration through them every time they are 
needed.  However, the impact to users is not the same for all backends.  Some 
backends introduce latency, some incur costs per API calls, and some, like 
Secrets Manager, result in both. 
   
   As I user, I want to control how often secrets are retrieved from source. 
The rest of the time I expect Airflow to just use the same value it retrieved a 
few minutes, or even seconds, earlier.  It's not as simple as improving DAGs, 
as many of those calls happen outside of a user's control.  That is why a cache 
is so important, and it is key for users to improve performance and reduce 
costs.  
   
   It should not just be on variables, but connections too.  Take the following 
example: I use Snowflake in my data lake.  I store the credentials in Secrets 
Manager.  I have 2,000 tables that update hourly from Airflow, each with a 
Snowflake operator.  That is 24 X 2,000 X 30 = 1.44 million monthly Secrets 
Manager calls, at[ $0.05/10,000 
calls](https://aws.amazon.com/secrets-manager/pricing/) that's an extra monthly 
charge of $144 to pull the same connection over and over again.  And that's 
with only 2,000 tasks per hour--lots of users have far greater usage than that.
   
   A bit of code to cache that data will reduce customer cost and improve 
performance.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to