potiuk commented on PR #44086:
URL: https://github.com/apache/airflow/pull/44086#issuecomment-2483898367

   > To be clear, this PR doesn't change anything re lazy imports, or imports 
from root. This is just using the FQN for internal code.
   
   Of course it does not change it - but it **avoids** the problem by switching 
to FQN imports from "from airflow.sdk import" ones (this is exactly we 
implemented pre-commits  - to  forbid  the internal code to use the "from 
airflow import" to avoid circular dependencies. History repeats itself.
   
   The true problem is that `airflow.sdk` has the same pattern - when we 
attempt to import some of the sub-packages at top level lazily and because our 
sub-packages refer to each other, this will only work if the original import is 
importing the sub-packages in the correct sequence, because it in some cases it 
will attempt to re-import the `__init__.py` that is still being imported.
   
   This will happen always when:  you are importing an internal moduke of 
airflow (in your case `airflow.cli.cli_config`), that then imports any common 
object from "airflow.sdk" by some sequence of imports it does (say in your case 
by a sequence of imports  `from airflow.sdk imports DAG` gets imported - which 
in turn imports `airflow.sdk.definitions.dag`), and then that 
`airflow.sdk.definitions.dag` - again might not be directly but by a sequence 
of imports, imports another common object from `airflow.sdk` - say "from 
airflow.sdk import BaseOperator".
   
   In this case what happens:
   
   1) "airflow.cli_config` is "being imported"
   
   2) it imports from `airflow.sdk import DAG`. At this moment the 
"airflow.sdk" module is not fully imported yet, beacause the whole "from 
airlfow.sdk import DAG" will only mark the "airflow.sdk" as imported AFTER that 
line is fully completed by Python parser - only then it will be added to 
"airflow.sdk".
   
   3) so - while "airflow.sdk" is being imported, the 
`airflow.sdk.definitions.dag` is imported (via `__getattr__` - because DAG is 
in the process of being imported) 
   
   4) that - in turn - make "dag" module run (directly or indirectly through 
other imports run `from airflow.sdk import BaseOperator`
   
   5) and this fails with circular import because `airflow.sdk` module is not 
yet fully imported and we are trying to effectively import it second time 
before the "on-going" import is completed and "airflow.sdk" is added to 
"modules" dictionary..
   
   So yes - you are not changing aliases here, you are avoiding the problem, 
really. And the lazy imports are not really the root-cause of the problem, they 
are merely half-working workaround to actual issue we have here.
   
   In this case the root cause is not the "lazy import" itself. But the fact 
that we are "aliasing" multiple packages and modules in the same "airflow.sdk" 
package (and those classes have cross-imports).  
   
   
   In this case we alias both "airlfow.definitions.sdk.dag.DAG", and 
"airflow.definitions.sdk.baseoperator.BaseOperator" to be "airflow.sdk.DAG" and 
"airflow.sdk.BaseOperator" respectively. This aliasing and the fact that those 
aliased operators are using other "aliases" in the "airflow.sdk" is the root of 
the problem. 
   
   If we did not have those lazy imports, the whole aliasing thing would simply 
not work at all - you wil always get circular imports if you try to do normal 
"non-lazy" imports. So "lazy imports" is an attempt to workaround that circular 
import.
   
   But the lazy imports  are only masking the problem and solving only one 
variant of it - where  imports of aliased classes only happen ONCE in the whole 
import sequence when running the first import. But if the import tree attempts 
to import aliases twice - even with lazy imports you get "circular reference".
   
   I am not sure if there is any other solution than **not** having aliases at 
all, I think not because of the simple fact that "from airflow.sdk import 
ALIAS" will not make "airflow.sdk" fully imported until DAG is fully imported. 
And it's just consequence of how Python imports work.
   
   The solution of ours where we never use "aliases" in the internal code only 
works because it effectively avoids having two aliases imported in the "first 
time import sequence". If none of our internal code is using "from airlfow.sdk 
import", then it means that we will never attempt to load an alias from an 
aliased operator. That can be (similarly as in case of Airlfow) prevented and 
enforced by `pre-commit`. 
   
   But the problem is when one of our users uses a custom code that uses 
"aliases" in their custom code that is imported by "airflow" code.  For example 
when someone uses custom `SecretsManager` that is loaded by our config and that 
"SecretsManager" uses "from airlfow.sdk import ALIAS". In this case - depending 
which class is imported first, we might have another alias being imported 
first, then custom SecretsManager, and then again another alias from 
"airlfow.sdk" - which would trigger the circular import. And we cannot prevent 
or enforce custom code to use those aliases. There are plenty of similar issues 
with custom code of our users causing circular reference because of that very 
issue.
   
   And i am not sure if that can be solved in other way than not having the 
aliases at all.
   
   Unless somoene has a better idea.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to