sudeepgupta90 opened a new issue, #29486: URL: https://github.com/apache/airflow/issues/29486
### Apache Airflow Provider(s) microsoft-azure ### Versions of Apache Airflow Providers apache-airflow-providers-microsoft-azure: 5.2.0 ### Apache Airflow version 2.3.3 ### Operating System NAME="Ubuntu" VERSION="18.04.6 LTS (Bionic Beaver)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 18.04.6 LTS" VERSION_ID="18.04" HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" VERSION_CODENAME=bionic UBUNTU_CODENAME=bionic ### Deployment Official Apache Airflow Helm Chart ### Deployment details Metadatabase: Azure postgres Secret Backend: Azure key Vault Deployment Mode: Helm Chart ### What happened I have dags which can contain upto 500 tasks - most of these tasks call a secret connection lookup from the Azure KeyVault During Dag parsing, and task execution I find that often there's a timeout due to AKV, and I have received a throttling error from the Key Vault. I understand I can reduce dagbag parsing timeout by increasing this value `AIRFLOW__CORE__DAGBAG_IMPORT_TIMEOUT` , but tasks still timeout during execution. I also know that this problem of double parsing of dag during dagrun has been detailed and fixed [here](https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-45+Remove+double+dag+parsing+in+airflow+run) But what I want to explore is a simple caching mechanism at the AKV without changing/touching any of the Airflow internals ### What you think should happen instead There should be a caching mechanism while fetching secrets with a cache expiry without changing the behaviour of Airflow Internals. I have a working example which does not depend on any other external library. [We change this function signature](https://github.com/apache/airflow/blob/main/airflow/providers/microsoft/azure/secrets/key_vault.py#L178) ``` # include the ttl_time= 1, so default behaviour is not changed at all. # I have set this to 600 = 10 mins for my example reference working locally # the ttl_time is declared as one of init parameters for this class object @lru_cache(@lru_cache(maxsize= 128) def _get_secret(self, path_prefix: str, secret_id: str, ttl_hash = round(time.time()/self.ttl_time)) -> str | None: """ Get an Azure Key Vault secret value :param path_prefix: Prefix for the Path to get Secret :param secret_id: Secret Key """ name = self.build_path(path_prefix, secret_id, self.sep) try: secret = self.client.get_secret(name=name) return secret.value except ResourceNotFoundError as ex: self.log.debug("Secret %s not found: %s", name, ex) return None ``` ### How to reproduce If you have large dags which query your AKV secrets backend during dag parsing, and task execution, the KAV will start throttling you and tasks will fail ### Anything else Happens on large dags, and frequently ### Are you willing to submit PR? - [X] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
