NBardelot opened a new issue, #39203:
URL: https://github.com/apache/airflow/issues/39203

   ### Apache Airflow version
   
   Other Airflow 2 version (please specify below)
   
   ### If "Other Airflow 2 version" selected, which one?
   
   2.8.3
   
   ### What happened?
   
   We use a structure of git submodules synchronized by git-sync sidecars as 
per [the documentation of Typical Structure of 
Packages](https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/modules_management.html#typical-structure-of-packages).
   
   Some git submodules are containing DAGs files. Some other git submodules are 
containing utility libraries imported from the DAGs. The directory containing 
the utility libraries is configured in the PYTHONPATH to be available to DAGs. 
At startup everything goes well.
   
   But we have an issue when a new module is added, and a DAG is modified to 
import this new module (both modification pushed to git, with submodules 
updates, and the git-sync sidecar synchronizing the files). Then, the DAG 
Processor component of Airflow starts to reprocess the DAG Bag but it seems 
like the cache of importlib is not invalidated, and the new module is not found.
   
   We have such logs in the DAG Processor, in the import errors DB table, and 
thus in the UI:
   
   ```
   ModuleNotFoundError: No module named 'ournewlib`
   ```
   
   ### What you think should happen instead?
   
   The [documentation of 
`importlib`](https://docs.python.org/3/library/importlib.html#importlib.invalidate_caches)
 mentions that `invalidate_caches()` might be used:
   
   > If you are dynamically importing a module that was created since the 
interpreter began execution (e.g., created a Python source file), you may need 
to call invalidate_caches() in order for the new module to be noticed by the 
import system.
   
   It seems like the airflow processes should call `invalidate_caches()` when 
the `repo` linked *git ref* changes (meaning the content of the code might have 
changed and should be reprocessed with fresh imports).
   
   ### How to reproduce
   
   * With the Airflow Helm chart for exemple, create an Airflow instance.
   * Use git-sync to load DAGs from a git project.
   * Create a simple DAG.
   * See that the DAG is correctly processed.
   * At runtime, push the following:
     * a new non-DAG module in the git project, with a mock class/function
     * and an import in the DAG, in order to use this new module
   * See that the module is not imported, and an ImportError happens during the 
DAGFileProcessorProcess
   
   
   ### Operating System
   
   Kubernetes
   
   ### Versions of Apache Airflow Providers
   
   ```
   apache-airflow[celery,kubernetes,statsd,password,ldap,otel]
   
   apache-airflow-providers-amazon
   apache-airflow-providers-common-sql
   apache-airflow-providers-elasticsearch
   apache-airflow-providers-hashicorp
   apache-airflow-providers-http
   apache-airflow-providers-microsoft-winrm
   apache-airflow-providers-microsoft-azure
   apache-airflow-providers-opsgenie
   apache-airflow-providers-postgres
   apache-airflow-providers-redis
   apache-airflow-providers-sftp
   apache-airflow-providers-smtp
   apache-airflow-providers-ssh
   ```
   
   ### Deployment
   
   Official Apache Airflow Helm Chart
   
   ### Deployment details
   
   _No response_
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to