potiuk commented on a change in pull request #17757:
URL: https://github.com/apache/airflow/pull/17757#discussion_r694715558
##########
File path: docs/apache-airflow/modules_management.rst
##########
@@ -68,99 +81,192 @@ In the next section, you will learn how to create your own
simple
installable package and how to specify additional directories to be added
to ``sys.path`` using the environment variable :envvar:`PYTHONPATH`.
+If you want to import some packages from a directory that is added to
``PYTHONPATH`` you should import
+it following the full Python path of the files. All directories where you put
your files have to also
+have an empty ``__init__.py`` file which turns it into Python package. Take as
an example such structure
Review comment:
Just to add a bit of links on that and my thoughts and context for
``__init__.py`` in relation to our DAGs. I've spent quite some time on looking
how the packages/namespaces work and while I also had the impression at some
point in time that "as of python 3.3 we should drop the `__init__,py`
altogether, my understanding of it is a bit different now.
To focus the discussion - here is the PEP in question:
https://www.python.org/dev/peps/pep-0420
And here is particular chapter about impact on import finders and loaders:
https://www.python.org/dev/peps/pep-0420/#impact-on-import-finders-and-loaders
My understanding of it:
* lack of `__init__.py` in the folder means that (providing that import
finder and loader supports it), such folder might be detected as "implicit
namespace package" and modules could be added to the package in their own
namespace. Separate namespaces mean that the same package can be split across
multiple folders. But it really depends on the capabilities of loader/importer.
For example in setup.py you need to use `find_namespace_packages` instead of
`find_packages` to find the namespace-packages
* some tools methods that are looking for packages, might not work well with
those namespace packages still (they might expect the package = single folder
approach for example and break, or might not find the namespace packages at
all.. Its less and less common, but it happens (for example in mypy you need to
specify `--namespace-packages` flag to find them
https://mypy.readthedocs.io/en/stable/command_line.html#cmdoption-mypy-namespace-packages
* things can get broken easily if there are same package with `classic`
packages with `__init__.py` and namespace packages. There are also couple of
legacy way to provide namespaces packages (not all compatible with each other)
some of those described here
https://stackoverflow.com/questions/1675734/how-do-i-create-a-namespace-package-in-python
for example but I think there are also some other ways. There are historically
ways to modify `__init__.py` in the way to turn the package into a namespaced
one. But they have some limitations (for example you had to have exactly the
same ``__init__.py`` content in all the incarnations of the same package in
different namespaces. And if you mixed and matched the different ways of
defining namespace they might or might not work and behave `strangely`..
* there is a serious performance implication of using "implicit namespace
packages". There is even a note in PEP 420 that there is not even an intention
to replace "classic" packages with implicit namespaces:
> There is no intention to remove support of regular packages. If a
developer knows that her package will never be a portion of a namespace
package, then there is a performance advantage to it being a regular package
(with an `__init__.py`). Creation and loading of a regular package can take
place immediately when it is located along the path. With namespace packages,
all entries in the path must be scanned before the package is created.
I am not even sure if our DAG loader supports implicit namespaces, but even
if it does I think we should discourage the use of implicit namespaces for DAGs
- if not because of the potential clashes and conflicts with "classic" packages
or "explicit namespaces" that people can by accident have on their pythonpath -
but also because of performance.
I'd love to hear your thoughts about it :)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]