potiuk commented on a change in pull request #17757:
URL: https://github.com/apache/airflow/pull/17757#discussion_r694715558



##########
File path: docs/apache-airflow/modules_management.rst
##########
@@ -68,99 +81,192 @@ In the next section, you will learn how to create your own 
simple
 installable package and how to specify additional directories to be added
 to ``sys.path`` using the environment variable :envvar:`PYTHONPATH`.
 
+If you want to import some packages from a directory that is added to 
``PYTHONPATH`` you should import
+it following the full Python path of the files. All directories where you put 
your files have to also
+have an empty ``__init__.py`` file which turns it into Python package. Take as 
an example such structure

Review comment:
       Just to add a bit of links on that and my thoughts and context for 
``__init__.py`` in relation to our DAGs. I've spent quite some time on looking 
how the packages/namespaces work and while I also had the impression at some 
point in time that "as of python 3.3 we should drop the `__init__,py` 
altogether", my understanding of it is a bit different now.
   
   To focus the discussion - here is the PEP in question: 
https://www.python.org/dev/peps/pep-0420 
   
   And here is particular chapter about impact on import finders and loaders: 
https://www.python.org/dev/peps/pep-0420/#impact-on-import-finders-and-loaders 
   
   My understanding of it:
   
   * lack of `__init__.py` in the folder means that (providing that import 
finder and loader supports it), such folder might be detected as "implicit 
namespace package" and modules could be added to the package in their own 
namespace. Separate namespaces mean that the same package can be split across 
multiple folders. But it really depends on the capabilities of loader/importer. 
For example in setup.py you need to use `find_namespace_packages` instead of 
`find_packages` to find the namespace-packages
   
   * some tools/methods that are looking for packages, might not work well with 
those namespace packages still (they might expect the package = single folder 
approach for example and break, or might not find the namespace packages at 
all.. Its less and less common, but it happens (for example in mypy you need to 
specify `--namespace-packages` flag to find them 
https://mypy.readthedocs.io/en/stable/command_line.html#cmdoption-mypy-namespace-packages
   
   * things can get broken easily if there are same package with `classic` 
packages with `__init__.py` and namespace packages. There are also couple of 
legacy way to provide namespaces packages (not all compatible with each other) 
some of those described here 
https://stackoverflow.com/questions/1675734/how-do-i-create-a-namespace-package-in-python
 for example but I think there are also some other ways. There are historically 
ways to modify `__init__.py` in the way to turn the package into a namespaced 
one. But they have some limitations (for example you had to have exactly the 
same ``__init__.py`` content in all the incarnations of the same package in 
different namespaces.  And if you mixed and matched the different ways of 
defining namespace they might or might not work and behave `strangely`..
   
   * there is a serious performance implication of using "implicit namespace 
packages". There is even a note in PEP 420 that there is not even an intention 
to replace "classic" packages with implicit namespaces:
   
   > There is no intention to remove support of regular packages. If a 
developer knows that her package will never be a portion of a namespace 
package, then there is a performance advantage to it being a regular package 
(with an `__init__.py`). Creation and loading of a regular package can take 
place immediately when it is located along the path. With namespace packages, 
all entries in the path must be scanned before the package is created.
   
   I am not even sure if our DAG loader supports implicit namespaces, but even 
if it does I think we should discourage the use of implicit namespaces for DAGs 
- if not because of the potential clashes and conflicts with "classic" packages 
or "explicit namespaces" that people can by accident have on their pythonpath - 
but also because of performance.
   
   I'd love to hear your thoughts about it :)
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to