potiuk commented on a change in pull request #17757:
URL: https://github.com/apache/airflow/pull/17757#discussion_r694324576
##########
File path: docs/apache-airflow/modules_management.rst
##########
@@ -68,99 +81,192 @@ In the next section, you will learn how to create your own
simple
installable package and how to specify additional directories to be added
to ``sys.path`` using the environment variable :envvar:`PYTHONPATH`.
+If you want to import some packages from a directory that is added to
``PYTHONPATH`` you should import
+it following the full Python path of the files. All directories where you put
your files have to also
+have an empty ``__init__.py`` file which turns it into Python package. Take as
an example such structure
+as described below (the root directory which is on the ``PYTHONPATH`` might be
any of the directories
+listed in the next chapter or those that you added manually to the path.
-Creating a package in Python
-----------------------------
+Typical structure of packages
+-----------------------------
-1. Before starting, install the following packages:
+This is an example structure that you might have in your ``dags`` folder (see
below)
-``setuptools``: setuptools is a package development process library designed
-for creating and distributing Python packages.
+.. code-block:: none
-``wheel``: The wheel package provides a bdist_wheel command for setuptools. It
-creates .whl file which is directly installable through the ``pip install``
-command. We can then upload the same file to `PyPI <pypi.org>`_.
+ <DIRECTORY ON PYTHONPATH>
+ | .airflowignore -- only needed in in ``dags`` folder, see below
+ | -- my_company
+ | __init__.py
+ | common_package
+ | | __init__.py
+ | | common_module.py
+ | | subpackage
+ | | __init__.py
+ | | subpackaged_util_module.py
+ |
+ | my_custom_dags
+ | __init__.py
+ | my_dag_1.py
+ | my_dag_2.py
+ | base_dag.py
+
+In the case above, those are the ways you should import the python files:
-.. code-block:: bash
+.. code-block:: python
- pip install --upgrade pip setuptools wheel
+ from my_company.common_package.common_module import SomeClass
+ from my_company.common_package.subpackge.subpackaged_util_module import
AnotherClass
+ from my_company.my_custom_dags.base_dag import BaseDag
-2. Create the package directory - in our case, we will call it
``airflow_operators``.
+You can see the ``.ariflowignore`` file at the root of your folder. This is a
file that you can put in your
+``dags`` folder to tell Airflow which files from the 'dags` folder should be
ignored when Airflow
+scheduler looks for DAGs. It should contain regular expressions for the paths
that should be ignored. You
+do not need to have that file in any other folder in ``PYTHONPATH`` (and also
you can only keep
+shared code in the other folders, not the actual DAGs).
-.. code-block:: bash
+In the example above the dags are only in ``my_custom_dags`` folder, the
``common_package`` should not be
+scanned by scheduler when searching for DAGS, so we should ignore
``common_package`` folder. You also
+want to ignore the ``base_dag`` if you keep a base DAG there that
``my_dag1.py`` and ``my_dag1.py`` derives
+from. Your ``.airflowignore`` should look then like this:
- mkdir airflow_operators
+.. code-block:: none
-3. Create the file ``__init__.py`` inside the package and add following code:
+ my_company/common_package/.*
+ my_company/my_custom_dags/base_dag\.py
-.. code-block:: python
+Built-in ``PYTHONPATH`` entries in Airflow
+------------------------------------------
- print("Hello from airflow_operators")
+Airflow, when running dynamically adds three directories to the ``sys.path``:
-When we import this package, it should print the above message.
+- The ``dags`` folder: It is configured with option ``dags_folder`` in section
``[core]``.
+- The ``config`` folder: It is configured by setting ``AIRFLOW_HOME`` variable
(``{AIRFLOW_HOME}/config``) by default.
+- The ``plugins`` Folder: It is configured with option ``plugins_folder`` in
section ``[core]``.
-4. Create ``setup.py``:
+.. note::
+ DAGS folder in Airflow 2 should not be shared with Webserver. While you can
do it, unlike in Airflow 1.10
+ Airflow has no expectations that the DAGS folder is present for webserver.
In fact it's a bit of
+ security risk to share ``dags`` folder with the webserver, because it means
that people who write DAGS
+ can write code that webserver will be able to execute (And Airflow 2
approach is that webserver should
+ never run code which can be modified by users who write DAGs). Therefore if
you need to share some code
+ with Webserver, it is highly recommended that you share it via ``config``
or ``plugins`` folder or
+ via installed airflow packages (see below). Those folders are usually
managed and accessible by different
+ users (Admins/DevOps) than DAG folders (those are usually data-scientists),
so they are considered
+ as safe because they are part of configuration of Airflow installation that
can be controlled by the
+ people managing the installation.
+
+Best practices for module loading
+---------------------------------
+
+There are a few watch-outs you should be careful about when you import your
code.
+
+Use unique top package name
+...........................
+
+It is recommended that you always put your dags/common files in a subpackage
which is unique to your
+deployment (``my_company`` in the example below). It is far too easy to use
generic names for the
+folders that will clash with other packages already present in the system. For
example if you
+create ``airflow/operators`` subfolder it will not be accessible because
Airflow already has a package
+named ``airflow.operators`` and it will look there when importing ``from
airflow.operators``
+
+Don't use relative imports
+..........................
+
+Never use relative imports (starting with ``.``) that were added in Python 3.
+
+This is tempting to do something like that it in ``my_dag1.py``:
.. code-block:: python
- import setuptools
+ from .base_dag import BaseDag # NEVER DO THAT!!!!
- setuptools.setup(
- name="airflow_operators",
- )
+You should import such shared dag using full path (starting from the directory
which is added to
+``PYTHONPATH``:
-5. Build the wheel:
+.. code-block:: python
-.. code-block:: bash
+ from my_company.my_custom_dags.base_dag import BaseDag # This is cool
- python setup.py bdist_wheel
+The relative imports are counter-intuitive, depending on how you start your
python code, they behave
+differently. In Airflow the same DAG file might be parsed in different context
(by scheduler, by worker
+or during the tests) and in those cases, relatives imports might behave
differently. Always use full
+python package path when you import anything in Airflow DAGs, this will save
you a lot of troubles.
+You can read more about relative import caveats in
+`this Stack Overflow thread <https://stackoverflow.com/q/16981921/516701>`_
-This will create a few directories in the project and the overall structure
will
-look like following:
+Add ``__init__.py`` in package folders
+......................................
-.. code-block:: bash
+When you create folders you should add ``__init__.py`` file as empty files in
your folders. While in Python 3
+there is a concept of implicit namespaces where you do not have to add those
files to folder, Airflow
+expects that the files are added to all packages you added.
Review comment:
Hmm. I believe this is not as simple as "Python 3 does not need
__init__.py packages" - see my comment above. There is a good reason why we
have __init__.py files in airflow almost everywhere (except providers which is
a special case). But maybe I am wrong and we can remove them ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]