potiuk commented on PR #44086: URL: https://github.com/apache/airflow/pull/44086#issuecomment-2483898367
> To be clear, this PR doesn't change anything re lazy imports, or imports from root. This is just using the FQN for internal code. Of course it does not change it - but it **avoids** the problem by switching to FQN imports from "from airflow.sdk import" ones (this is exactly we implemented pre-commits - to forbid the internal code to use the "from airflow import" to avoid circular dependencies. History repeats itself. The true problem is that `airflow.sdk` has the same pattern - when we attempt to import some of the sub-packages at top level lazily and because our sub-packages refer to each other, this will only work if the original import is importing the sub-packages in the correct sequence, because it in some cases it will attempt to re-import the `__init__.py` that is still being imported. This will happen always when: you are importing an internal moduke of airflow (in your case `airflow.cli.cli_config`), that then imports any common object from "airflow.sdk" by some sequence of imports it does (say in your case by a sequence of imports `from airflow.sdk imports DAG` gets imported - which in turn imports `airflow.sdk.definitions.dag`), and then that `airflow.sdk.definitions.dag` - again might not be directly but by a sequence of imports, imports another common object from `airflow.sdk` - say "from airflow.sdk import BaseOperator". In this case what happens: 1) "airflow.cli_config` is "being imported" 2) it imports from `airflow.sdk import DAG`. At this moment the "airflow.sdk" module is not fully imported yet, beacause the whole "from airlfow.sdk import DAG" will only mark the "airflow.sdk" as imported AFTER that line is fully completed by Python parser - only then it will be added to "airflow.sdk". 3) so - while "airflow.sdk" is being imported, the `airflow.sdk.definitions.dag` is imported (via `__getattr__` - because DAG is in the process of being imported) 4) that - in turn - make "dag" module run (directly or indirectly through other imports run `from airflow.sdk import BaseOperator` 5) and this fails with circular import because `airflow.sdk` module is not yet fully imported and we are trying to effectively import it second time before the "on-going" import is completed and "airflow.sdk" is added to "modules" dictionary.. So yes - you are not changing aliases here, you are avoiding the problem, really. And the lazy imports are not really the root-cause of the problem, they are merely half-working workaround to actual issue we have here. In this case the root cause is not the "lazy import" itself. But the fact that we are "aliasing" multiple packages and modules in the same "airflow.sdk" package (and those classes have cross-imports). In this case we alias both "airlfow.definitions.sdk.dag.DAG", and "airflow.definitions.sdk.baseoperator.BaseOperator" to be "airflow.sdk.DAG" and "airflow.sdk.BaseOperator" respectively. This aliasing and the fact that those aliased operators are using other "aliases" in the "airflow.sdk" is the root of the problem. If we did not have those lazy imports, the whole aliasing thing would simply not work at all - you wil always get circular imports if you try to do normal "non-lazy" imports. So "lazy imports" is an attempt to workaround that circular import. But the lazy imports are only masking the problem and solving only one variant of it - where imports of aliased classes only happen ONCE in the whole import sequence when running the first import. But if the import tree attempts to import aliases twice - even with lazy imports you get "circular reference". I am not sure if there is any other solution than **not** having aliases at all, I think not because of the simple fact that "from airflow.sdk import ALIAS" will not make "airflow.sdk" fully imported until DAG is fully imported. And it's just consequence of how Python imports work. The solution of ours where we never use "aliases" in the internal code only works because it effectively avoids having two aliases imported in the "first time import sequence". If none of our internal code is using "from airlfow.sdk import", then it means that we will never attempt to load an alias from an aliased operator. That can be (similarly as in case of Airlfow) prevented and enforced by `pre-commit`. But the problem is when one of our users uses a custom code that uses "aliases" in their custom code that is imported by "airflow" code. For example when someone uses custom `SecretsManager` that is loaded by our config and that "SecretsManager" uses "from airlfow.sdk import ALIAS". In this case - depending which class is imported first, we might have another alias being imported first, then custom SecretsManager, and then again another alias from "airlfow.sdk" - which would trigger the circular import. And we cannot prevent or enforce custom code to use those aliases. There are plenty of similar issues with custom code of our users causing circular reference because of that very issue. And i am not sure if that can be solved in other way than not having the aliases at all. Unless somoene has a better idea. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
