NBardelot opened a new issue, #39203: URL: https://github.com/apache/airflow/issues/39203
### Apache Airflow version Other Airflow 2 version (please specify below) ### If "Other Airflow 2 version" selected, which one? 2.8.3 ### What happened? We use a structure of git submodules synchronized by git-sync sidecars as per [the documentation of Typical Structure of Packages](https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/modules_management.html#typical-structure-of-packages). Some git submodules are containing DAGs files. Some other git submodules are containing utility libraries imported from the DAGs. The directory containing the utility libraries is configured in the PYTHONPATH to be available to DAGs. At startup everything goes well. But we have an issue when a new module is added, and a DAG is modified to import this new module (both modification pushed to git, with submodules updates, and the git-sync sidecar synchronizing the files). Then, the DAG Processor component of Airflow starts to reprocess the DAG Bag but it seems like the cache of importlib is not invalidated, and the new module is not found. We have such logs in the DAG Processor, in the import errors DB table, and thus in the UI: ``` ModuleNotFoundError: No module named 'ournewlib` ``` ### What you think should happen instead? The [documentation of `importlib`](https://docs.python.org/3/library/importlib.html#importlib.invalidate_caches) mentions that `invalidate_caches()` might be used: > If you are dynamically importing a module that was created since the interpreter began execution (e.g., created a Python source file), you may need to call invalidate_caches() in order for the new module to be noticed by the import system. It seems like the airflow processes should call `invalidate_caches()` when the `repo` linked *git ref* changes (meaning the content of the code might have changed and should be reprocessed with fresh imports). ### How to reproduce * With the Airflow Helm chart for exemple, create an Airflow instance. * Use git-sync to load DAGs from a git project. * Create a simple DAG. * See that the DAG is correctly processed. * At runtime, push the following: * a new non-DAG module in the git project, with a mock class/function * and an import in the DAG, in order to use this new module * See that the module is not imported, and an ImportError happens during the DAGFileProcessorProcess ### Operating System Kubernetes ### Versions of Apache Airflow Providers ``` apache-airflow[celery,kubernetes,statsd,password,ldap,otel] apache-airflow-providers-amazon apache-airflow-providers-common-sql apache-airflow-providers-elasticsearch apache-airflow-providers-hashicorp apache-airflow-providers-http apache-airflow-providers-microsoft-winrm apache-airflow-providers-microsoft-azure apache-airflow-providers-opsgenie apache-airflow-providers-postgres apache-airflow-providers-redis apache-airflow-providers-sftp apache-airflow-providers-smtp apache-airflow-providers-ssh ``` ### Deployment Official Apache Airflow Helm Chart ### Deployment details _No response_ ### Anything else? _No response_ ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
