potiuk commented on code in PR #46608:
URL: https://github.com/apache/airflow/pull/46608#discussion_r1957137894


##########
providers/apache/cassandra/tests/system/__init__.py:
##########
@@ -0,0 +1,17 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+__path__ = __import__("pkgutil").extend_path(__path__, __name__)  # type: 
ignore

Review Comment:
   No, not possible I am afraid. And it is needed, because with this change we 
have the same "packages" in different places in source tree. For example when 
you have:
   
   * `amazon/tests/system/amazon/....`
   * `apache/cassandra/tests/system/apache/cassandra` 
   
   And you want to do `import system.apache.cassandra` and you have `amazon's` 
system in the PYTHONPATH first, you will get `no such package system.apache`.
   
   This is because those packages must be declared as "namespace" packages.
   
   The `__path__ = __import__("pkgutil").extend_path(__path__, __name__)` is 
the canonical way of implementing it (unless you have implicit - or so called 
native - namespace packages).
   
   What we are using now is so called "legacy namespace packages" - this is 
described here 
https://packaging.python.org/en/latest/guides/packaging-namespace-packages/#legacy-namespace-packages
 
   
   The reason why we are using legacy namespace packages insttead of native 
ones, is mainly due to compatibiity with airflow package itself. In Airflow we 
are already using legacy namespace packages for "airflow.providers" - 
https://github.com/apache/airflow/blob/main/airflow/__init__.py#L26 - and the 
problem is that you cannot really mix native and legacy packages. 
   
   See the quotes from the documentation of namespace packages:
   
   > These two methods, that were used to create namespace packages prior to 
[PEP 420](https://peps.python.org/pep-0420/), are now considered to be obsolete 
and should not be used unless you need compatibility with packages already 
using this method. Also, 
[pkg_resources](https://setuptools.pypa.io/en/latest/pkg_resources.html) has 
been deprecated.
   
   And taking into acount this:
   
   > While native namespace packages and pkgutil-style namespace packages are 
largely compatible, pkg_resources-style namespace packages are not compatible 
with the other methods. It’s inadvisable to use different methods in different 
distributions that provide packages to the same namespace.
    
   We are not supposed to mix the two methods. So we don't.  But it also means 
that we have to make sure that all the packages that are potentially appearing 
in several places in our source tree are "legacy" namespace packages.
   
   In order to switch fully to native namespace packages (i.e. not having 
an`__init__.py` at all) - we would have to do something that I strongly 
advocate for, i.e. stop doing any work while we are importing airlfow. 
Currently airflow's `__init__.py`  does a lot of things - initalizes settings, 
sqlalchemy, logging and a number of other things. 
   
   And we do a lot of workarounds to minimise side effects of it -> lazy 
loading stuff in a number of places, not initializing ORM in other places.  
Maybe when we complete Airflow packaging work for Airlfow 3 we will get to the 
point that this initialisation and lazy loading will be removed and we just 
explicitly do initialization when we run appropriate "airflow" commands (which 
is something I advocate for a long time). 
   
   This whole "pkgutil" dance we do now could be removed if we do so - and all 
the packages in airlfow and providers could be "native namespace package" and 
we could get rid of about 100 `__init__.py`  files  (including the 
`airflow.__init__.py` one ) in our repo. We could also stop having the 
`partially initialized module (most likely due to a circular import)` pesky 
errors we and a number of our users experience - because having all the 
initialization happening in `airflow.__init__.py` is the root cause for those. 
   
   See 
https://packaging.python.org/en/latest/guides/packaging-namespace-packages/ for 
details on namespace packages.
   
   cc: @ashb @uranusjr @kaxil -> something for consideration in our discussions 
on how to repackage airlfow soon. I keep on explaining why running code in 
`airflow.__init__.py` is a bad idea and advocating for removal of it and 
replace it with explicit initialization, yet that topic have not been discussed 
yet, but I will plan to start a discussion about it soon once we approach the 
packaging subject. I am not sure what's your thinking is about this - I know 
you spent consirderable amount of time on doing all the "lazy initalization" 
dance all over the places, and I think it adds a lot of complexity to our code 
and only partially solves the `cicular imports` problem. But I know @ashb  has 
very strong feeling about being able to do "from airlfow import Dag" - which 
more or less requires all this complexity. I just don't think it's worth it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to