NBardelot commented on issue #39203: URL: https://github.com/apache/airflow/issues/39203#issuecomment-2072831899
After further analysis, we think it can be a race condition as per the `importlib` [documentation of `FileFinder`](https://docs.python.org/3/library/importlib.html#importlib.machinery.FileFinder). > The finder will cache the directory contents as necessary, making stat calls for each module search to verify the cache is not outdated. Because cache staleness relies upon the granularity of the operating system’s state information of the file system, there is a potential race condition of searching for a module, creating a new file, and then searching for the module the new file represents. If the operations happen fast enough to fit within the granularity of stat calls, then the module search will fail. To prevent this from happening, when you create a module dynamically, make sure to call importlib.invalidate_caches(). Here is the status of our cache: ``` $ python Python 3.10.13 (main, Mar 12 2024, 12:22:40) [GCC 12.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> sys.path_importer_cache {'/opt/airflow/dags/repo/src/libs': FileFinder('/opt/airflow/dags/repo/src/libs'), '/usr/local/lib/python310.zip': None, '/usr/local/lib/python3.10': FileFinder('/usr/local/lib/python3.10'), '/usr/local/lib/python3.10/encodings': FileFinder('/usr/local/lib/python3.10/encodings'), '/usr/local/lib/python3.10/importlib': FileFinder('/usr/local/lib/python3.10/importlib'), '/home/airflow/.local/lib/python3.10/site-packages': FileFinder('/home/airflow/.local/lib/python3.10/site-packages'), '/usr/local/lib/python3.10/lib-dynload': FileFinder('/usr/local/lib/python3.10/lib-dynload'), '/usr/local/lib/python3.10/site-packages': FileFinder('/usr/local/lib/python3.10/site-packages'), '/opt/airflow': FileFinder('/opt/airflow'), '/usr/local/lib/python3.10/collections': FileFinder('/usr/local/lib/python3.10/collections')} >>> ``` ... where `/opt/airflow/dags/repo/src/libs` contains our utility modules. As git-sync synchronizes rapidly both the DAG and the imported module, the race condition mentionned in `importlib` might be the cause of our issue. Here again the documentation mention the use of `importlib.invalidate_caches()` as the correct way to prevent the issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
