jedcunningham commented on a change in pull request #17757:
URL: https://github.com/apache/airflow/pull/17757#discussion_r694114919



##########
File path: docs/apache-airflow/modules_management.rst
##########
@@ -25,9 +25,21 @@ Airflow configuration. The following article will describe 
how you can
 create your own module so that Airflow can load it correctly, as well as
 diagnose problems when modules are not loaded properly.
 
+Often you want to use your own python code in your Airflow deployment,
+for example common code, libraries, you might want to generate DAGs using
+shared python code and have several DAG python files.
 
-Packages Loading in Python
---------------------------
+You can do it in one of those ways:
+
+* add your modules to one of the folders that airflow automatically adds to 
``PYTHONPATH``
+* add extra folders where you keep your code to ``PYTHONPATH``
+* package your code into Python package and install it together with Airflow.

Review comment:
       ```suggestion
   * package your code into a Python package and install it together with 
Airflow.
   ```

##########
File path: docs/apache-airflow/modules_management.rst
##########
@@ -68,99 +81,192 @@ In the next section, you will learn how to create your own 
simple
 installable package and how to specify additional directories to be added
 to ``sys.path`` using the environment variable :envvar:`PYTHONPATH`.
 
+If you want to import some packages from a directory that is added to 
``PYTHONPATH`` you should import
+it following the full Python path of the files. All directories where you put 
your files have to also
+have an empty ``__init__.py`` file which turns it into Python package. Take as 
an example such structure

Review comment:
       ```suggestion
   it using the full Python path of the files. Take as an example such structure
   ```
   
   Python 3 doesn't need `__init__.py`'s, right, particularly when you're just 
dealing with a normal dir being added to `PYTHONPATH`?

##########
File path: docs/apache-airflow/modules_management.rst
##########
@@ -68,99 +81,192 @@ In the next section, you will learn how to create your own 
simple
 installable package and how to specify additional directories to be added
 to ``sys.path`` using the environment variable :envvar:`PYTHONPATH`.
 
+If you want to import some packages from a directory that is added to 
``PYTHONPATH`` you should import
+it following the full Python path of the files. All directories where you put 
your files have to also
+have an empty ``__init__.py`` file which turns it into Python package. Take as 
an example such structure
+as described below (the root directory which is on the ``PYTHONPATH`` might be 
any of the directories
+listed in the next chapter or those that you added manually to the path.
 
-Creating a package in Python
-----------------------------
+Typical structure of packages
+-----------------------------
 
-1. Before starting, install the following packages:
+This is an example structure that you might have in your ``dags`` folder (see 
below)
 
-``setuptools``: setuptools is a package development process library designed
-for creating and distributing Python packages.
+.. code-block:: none
 
-``wheel``: The wheel package provides a bdist_wheel command for setuptools. It
-creates .whl file which is directly installable through the ``pip install``
-command. We can then upload the same file to `PyPI <pypi.org>`_.
+   <DIRECTORY ON PYTHONPATH>
+   | .airflowignore  -- only needed in in ``dags`` folder, see below
+   | -- my_company
+                 | __init__.py
+                 | common_package
+                 |              |  __init__.py
+                 |              | common_module.py
+                 |              | subpackage
+                 |                         | __init__.py
+                 |                         | subpackaged_util_module.py
+                 |
+                 | my_custom_dags
+                                 | __init__.py
+                                 | my_dag_1.py
+                                 | my_dag_2.py
+                                 | base_dag.py
+
+In the case above, those are the ways you should import the python files:
 
-.. code-block:: bash
+.. code-block:: python
 
-    pip install --upgrade pip setuptools wheel
+   from my_company.common_package.common_module import SomeClass
+   from my_company.common_package.subpackge.subpackaged_util_module import 
AnotherClass
+   from my_company.my_custom_dags.base_dag import BaseDag
 
-2. Create the package directory - in our case, we will call it 
``airflow_operators``.
+You can see the ``.ariflowignore`` file at the root of your folder. This is a 
file that you can put in your
+``dags`` folder to tell Airflow which files from the 'dags` folder should be 
ignored when Airflow
+scheduler looks for DAGs. It should contain regular expressions for the paths 
that should be ignored. You
+do not need to have that file in any other folder in ``PYTHONPATH`` (and also 
you can only keep
+shared code in the other folders, not the actual DAGs).
 
-.. code-block:: bash
+In the example above the dags are only in ``my_custom_dags`` folder, the 
``common_package`` should not be
+scanned by scheduler when searching for DAGS, so we should ignore 
``common_package`` folder. You also
+want to ignore the ``base_dag`` if you keep a base DAG there that 
``my_dag1.py`` and ``my_dag1.py`` derives
+from. Your ``.airflowignore`` should look then like this:
 
-    mkdir airflow_operators
+.. code-block:: none
 
-3. Create the file ``__init__.py`` inside the package and add following code:
+   my_company/common_package/.*
+   my_company/my_custom_dags/base_dag\.py
 
-.. code-block:: python
+Built-in ``PYTHONPATH`` entries in Airflow
+------------------------------------------
 
-    print("Hello from airflow_operators")
+Airflow, when running dynamically adds three directories to the ``sys.path``:
 
-When we import this package, it should print the above message.
+- The ``dags`` folder: It is configured with option ``dags_folder`` in section 
``[core]``.
+- The ``config`` folder: It is configured by setting ``AIRFLOW_HOME`` variable 
(``{AIRFLOW_HOME}/config``) by default.
+- The ``plugins`` Folder: It is configured with option ``plugins_folder`` in 
section ``[core]``.
 
-4. Create ``setup.py``:
+.. note::
+   DAGS folder in Airflow 2 should not be shared with Webserver. While you can 
do it, unlike in Airflow 1.10
+   Airflow has no expectations that the DAGS folder is present for webserver. 
In fact it's a bit of
+   security risk to share ``dags`` folder with the webserver, because it means 
that people who write DAGS
+   can write code that webserver will be able to execute (And Airflow 2 
approach is that webserver should
+   never run code which can be modified by users who write DAGs). Therefore if 
you need to share some code
+   with Webserver, it is highly recommended that you share it via ``config`` 
or ``plugins`` folder or
+   via installed airflow packages (see below). Those folders are usually 
managed and accessible by different
+   users (Admins/DevOps) than DAG folders (those are usually data-scientists), 
so they are considered
+   as safe because they are part of configuration of Airflow installation that 
can be controlled by the
+   people managing the installation.

Review comment:
       ```suggestion
      The DAGS folder in Airflow 2 should not be shared with the webserver. 
While you can do it, unlike in Airflow 1.10,
      Airflow has no expectations that the DAGS folder is present in the 
webserver. In fact it's a bit of
      security risk to share the ``dags`` folder with the webserver, because it 
means that people who write DAGS
      can write code that the webserver will be able to execute (ideally the 
webserver should
      never run code which can be modified by users who write DAGs). Therefore 
if you need to share some code
      with the webserver, it is highly recommended that you share it via 
``config`` or ``plugins`` folder or
      via installed packages (see below). Those folders are usually managed and 
accessible by different
      users (Admins/DevOps) than DAG folders (those are usually 
data-scientists), so they are considered
      safe because they are part of configuration of the Airflow installation 
and controlled by the
      people managing the installation.
   ```

##########
File path: docs/apache-airflow/modules_management.rst
##########
@@ -68,99 +81,192 @@ In the next section, you will learn how to create your own 
simple
 installable package and how to specify additional directories to be added
 to ``sys.path`` using the environment variable :envvar:`PYTHONPATH`.
 
+If you want to import some packages from a directory that is added to 
``PYTHONPATH`` you should import
+it following the full Python path of the files. All directories where you put 
your files have to also
+have an empty ``__init__.py`` file which turns it into Python package. Take as 
an example such structure
+as described below (the root directory which is on the ``PYTHONPATH`` might be 
any of the directories
+listed in the next chapter or those that you added manually to the path.
 
-Creating a package in Python
-----------------------------
+Typical structure of packages
+-----------------------------
 
-1. Before starting, install the following packages:
+This is an example structure that you might have in your ``dags`` folder (see 
below)
 
-``setuptools``: setuptools is a package development process library designed
-for creating and distributing Python packages.
+.. code-block:: none
 
-``wheel``: The wheel package provides a bdist_wheel command for setuptools. It
-creates .whl file which is directly installable through the ``pip install``
-command. We can then upload the same file to `PyPI <pypi.org>`_.
+   <DIRECTORY ON PYTHONPATH>
+   | .airflowignore  -- only needed in in ``dags`` folder, see below
+   | -- my_company
+                 | __init__.py
+                 | common_package
+                 |              |  __init__.py
+                 |              | common_module.py
+                 |              | subpackage
+                 |                         | __init__.py
+                 |                         | subpackaged_util_module.py
+                 |
+                 | my_custom_dags
+                                 | __init__.py
+                                 | my_dag_1.py
+                                 | my_dag_2.py
+                                 | base_dag.py
+
+In the case above, those are the ways you should import the python files:
 
-.. code-block:: bash
+.. code-block:: python
 
-    pip install --upgrade pip setuptools wheel
+   from my_company.common_package.common_module import SomeClass
+   from my_company.common_package.subpackge.subpackaged_util_module import 
AnotherClass
+   from my_company.my_custom_dags.base_dag import BaseDag
 
-2. Create the package directory - in our case, we will call it 
``airflow_operators``.
+You can see the ``.ariflowignore`` file at the root of your folder. This is a 
file that you can put in your
+``dags`` folder to tell Airflow which files from the 'dags` folder should be 
ignored when Airflow
+scheduler looks for DAGs. It should contain regular expressions for the paths 
that should be ignored. You
+do not need to have that file in any other folder in ``PYTHONPATH`` (and also 
you can only keep
+shared code in the other folders, not the actual DAGs).
 
-.. code-block:: bash
+In the example above the dags are only in ``my_custom_dags`` folder, the 
``common_package`` should not be
+scanned by scheduler when searching for DAGS, so we should ignore 
``common_package`` folder. You also
+want to ignore the ``base_dag`` if you keep a base DAG there that 
``my_dag1.py`` and ``my_dag1.py`` derives
+from. Your ``.airflowignore`` should look then like this:
 
-    mkdir airflow_operators
+.. code-block:: none
 
-3. Create the file ``__init__.py`` inside the package and add following code:
+   my_company/common_package/.*
+   my_company/my_custom_dags/base_dag\.py
 
-.. code-block:: python
+Built-in ``PYTHONPATH`` entries in Airflow
+------------------------------------------
 
-    print("Hello from airflow_operators")
+Airflow, when running dynamically adds three directories to the ``sys.path``:
 
-When we import this package, it should print the above message.
+- The ``dags`` folder: It is configured with option ``dags_folder`` in section 
``[core]``.
+- The ``config`` folder: It is configured by setting ``AIRFLOW_HOME`` variable 
(``{AIRFLOW_HOME}/config``) by default.
+- The ``plugins`` Folder: It is configured with option ``plugins_folder`` in 
section ``[core]``.
 
-4. Create ``setup.py``:
+.. note::
+   DAGS folder in Airflow 2 should not be shared with Webserver. While you can 
do it, unlike in Airflow 1.10
+   Airflow has no expectations that the DAGS folder is present for webserver. 
In fact it's a bit of
+   security risk to share ``dags`` folder with the webserver, because it means 
that people who write DAGS
+   can write code that webserver will be able to execute (And Airflow 2 
approach is that webserver should
+   never run code which can be modified by users who write DAGs). Therefore if 
you need to share some code
+   with Webserver, it is highly recommended that you share it via ``config`` 
or ``plugins`` folder or
+   via installed airflow packages (see below). Those folders are usually 
managed and accessible by different
+   users (Admins/DevOps) than DAG folders (those are usually data-scientists), 
so they are considered
+   as safe because they are part of configuration of Airflow installation that 
can be controlled by the
+   people managing the installation.
+
+Best practices for module loading
+---------------------------------
+
+There are a few watch-outs you should be careful about when you import your 
code.
+
+Use unique top package name
+...........................
+
+It is recommended that you always put your dags/common files in a subpackage 
which is unique to your
+deployment (``my_company`` in the example below). It is far too easy to use 
generic names for the
+folders that will clash with other packages already present in the system. For 
example if you
+create ``airflow/operators`` subfolder it will not be accessible because 
Airflow already has a package
+named ``airflow.operators`` and it will look there when importing ``from 
airflow.operators``

Review comment:
       ```suggestion
   named ``airflow.operators`` and it will look there when importing ``from 
airflow.operators``.
   ```

##########
File path: docs/apache-airflow/modules_management.rst
##########
@@ -25,9 +25,21 @@ Airflow configuration. The following article will describe 
how you can
 create your own module so that Airflow can load it correctly, as well as
 diagnose problems when modules are not loaded properly.
 
+Often you want to use your own python code in your Airflow deployment,
+for example common code, libraries, you might want to generate DAGs using
+shared python code and have several DAG python files.
 
-Packages Loading in Python
---------------------------
+You can do it in one of those ways:
+
+* add your modules to one of the folders that airflow automatically adds to 
``PYTHONPATH``

Review comment:
       ```suggestion
   * add your modules to one of the folders that Airflow automatically adds to 
``PYTHONPATH``
   ```

##########
File path: docs/apache-airflow/modules_management.rst
##########
@@ -68,99 +81,192 @@ In the next section, you will learn how to create your own 
simple
 installable package and how to specify additional directories to be added
 to ``sys.path`` using the environment variable :envvar:`PYTHONPATH`.
 
+If you want to import some packages from a directory that is added to 
``PYTHONPATH`` you should import
+it following the full Python path of the files. All directories where you put 
your files have to also
+have an empty ``__init__.py`` file which turns it into Python package. Take as 
an example such structure
+as described below (the root directory which is on the ``PYTHONPATH`` might be 
any of the directories
+listed in the next chapter or those that you added manually to the path.
 
-Creating a package in Python
-----------------------------
+Typical structure of packages
+-----------------------------
 
-1. Before starting, install the following packages:
+This is an example structure that you might have in your ``dags`` folder (see 
below)
 
-``setuptools``: setuptools is a package development process library designed
-for creating and distributing Python packages.
+.. code-block:: none
 
-``wheel``: The wheel package provides a bdist_wheel command for setuptools. It
-creates .whl file which is directly installable through the ``pip install``
-command. We can then upload the same file to `PyPI <pypi.org>`_.
+   <DIRECTORY ON PYTHONPATH>
+   | .airflowignore  -- only needed in in ``dags`` folder, see below
+   | -- my_company
+                 | __init__.py
+                 | common_package
+                 |              |  __init__.py
+                 |              | common_module.py
+                 |              | subpackage
+                 |                         | __init__.py
+                 |                         | subpackaged_util_module.py
+                 |
+                 | my_custom_dags
+                                 | __init__.py
+                                 | my_dag_1.py
+                                 | my_dag_2.py
+                                 | base_dag.py
+
+In the case above, those are the ways you should import the python files:

Review comment:
       ```suggestion
   In the case above, there are the ways you could import the python files:
   ```

##########
File path: docs/apache-airflow/modules_management.rst
##########
@@ -68,99 +81,192 @@ In the next section, you will learn how to create your own 
simple
 installable package and how to specify additional directories to be added
 to ``sys.path`` using the environment variable :envvar:`PYTHONPATH`.
 
+If you want to import some packages from a directory that is added to 
``PYTHONPATH`` you should import
+it following the full Python path of the files. All directories where you put 
your files have to also
+have an empty ``__init__.py`` file which turns it into Python package. Take as 
an example such structure
+as described below (the root directory which is on the ``PYTHONPATH`` might be 
any of the directories
+listed in the next chapter or those that you added manually to the path.
 
-Creating a package in Python
-----------------------------
+Typical structure of packages
+-----------------------------
 
-1. Before starting, install the following packages:
+This is an example structure that you might have in your ``dags`` folder (see 
below)
 
-``setuptools``: setuptools is a package development process library designed
-for creating and distributing Python packages.
+.. code-block:: none
 
-``wheel``: The wheel package provides a bdist_wheel command for setuptools. It
-creates .whl file which is directly installable through the ``pip install``
-command. We can then upload the same file to `PyPI <pypi.org>`_.
+   <DIRECTORY ON PYTHONPATH>
+   | .airflowignore  -- only needed in in ``dags`` folder, see below
+   | -- my_company
+                 | __init__.py
+                 | common_package
+                 |              |  __init__.py
+                 |              | common_module.py
+                 |              | subpackage
+                 |                         | __init__.py
+                 |                         | subpackaged_util_module.py
+                 |
+                 | my_custom_dags
+                                 | __init__.py
+                                 | my_dag_1.py
+                                 | my_dag_2.py
+                                 | base_dag.py
+
+In the case above, those are the ways you should import the python files:
 
-.. code-block:: bash
+.. code-block:: python
 
-    pip install --upgrade pip setuptools wheel
+   from my_company.common_package.common_module import SomeClass
+   from my_company.common_package.subpackge.subpackaged_util_module import 
AnotherClass
+   from my_company.my_custom_dags.base_dag import BaseDag
 
-2. Create the package directory - in our case, we will call it 
``airflow_operators``.
+You can see the ``.ariflowignore`` file at the root of your folder. This is a 
file that you can put in your
+``dags`` folder to tell Airflow which files from the 'dags` folder should be 
ignored when Airflow
+scheduler looks for DAGs. It should contain regular expressions for the paths 
that should be ignored. You
+do not need to have that file in any other folder in ``PYTHONPATH`` (and also 
you can only keep
+shared code in the other folders, not the actual DAGs).
 
-.. code-block:: bash
+In the example above the dags are only in ``my_custom_dags`` folder, the 
``common_package`` should not be
+scanned by scheduler when searching for DAGS, so we should ignore 
``common_package`` folder. You also
+want to ignore the ``base_dag`` if you keep a base DAG there that 
``my_dag1.py`` and ``my_dag1.py`` derives
+from. Your ``.airflowignore`` should look then like this:
 
-    mkdir airflow_operators
+.. code-block:: none
 
-3. Create the file ``__init__.py`` inside the package and add following code:
+   my_company/common_package/.*
+   my_company/my_custom_dags/base_dag\.py
 
-.. code-block:: python
+Built-in ``PYTHONPATH`` entries in Airflow
+------------------------------------------
 
-    print("Hello from airflow_operators")
+Airflow, when running dynamically adds three directories to the ``sys.path``:
 
-When we import this package, it should print the above message.
+- The ``dags`` folder: It is configured with option ``dags_folder`` in section 
``[core]``.
+- The ``config`` folder: It is configured by setting ``AIRFLOW_HOME`` variable 
(``{AIRFLOW_HOME}/config``) by default.
+- The ``plugins`` Folder: It is configured with option ``plugins_folder`` in 
section ``[core]``.
 
-4. Create ``setup.py``:
+.. note::
+   DAGS folder in Airflow 2 should not be shared with Webserver. While you can 
do it, unlike in Airflow 1.10
+   Airflow has no expectations that the DAGS folder is present for webserver. 
In fact it's a bit of
+   security risk to share ``dags`` folder with the webserver, because it means 
that people who write DAGS
+   can write code that webserver will be able to execute (And Airflow 2 
approach is that webserver should
+   never run code which can be modified by users who write DAGs). Therefore if 
you need to share some code
+   with Webserver, it is highly recommended that you share it via ``config`` 
or ``plugins`` folder or
+   via installed airflow packages (see below). Those folders are usually 
managed and accessible by different
+   users (Admins/DevOps) than DAG folders (those are usually data-scientists), 
so they are considered
+   as safe because they are part of configuration of Airflow installation that 
can be controlled by the
+   people managing the installation.
+
+Best practices for module loading
+---------------------------------
+
+There are a few watch-outs you should be careful about when you import your 
code.
+
+Use unique top package name
+...........................
+
+It is recommended that you always put your dags/common files in a subpackage 
which is unique to your
+deployment (``my_company`` in the example below). It is far too easy to use 
generic names for the
+folders that will clash with other packages already present in the system. For 
example if you
+create ``airflow/operators`` subfolder it will not be accessible because 
Airflow already has a package
+named ``airflow.operators`` and it will look there when importing ``from 
airflow.operators``
+
+Don't use relative imports
+..........................
+
+Never use relative imports (starting with ``.``) that were added in Python 3.
+
+This is tempting to do something like that it in ``my_dag1.py``:
 
 .. code-block:: python
 
-    import setuptools
+   from .base_dag import BaseDag  # NEVER DO THAT!!!!
 
-    setuptools.setup(
-        name="airflow_operators",
-    )
+You should import such shared dag using full path (starting from the directory 
which is added to
+``PYTHONPATH``:

Review comment:
       ```suggestion
   ``PYTHONPATH``):
   ```

##########
File path: docs/apache-airflow/modules_management.rst
##########
@@ -25,9 +25,21 @@ Airflow configuration. The following article will describe 
how you can
 create your own module so that Airflow can load it correctly, as well as
 diagnose problems when modules are not loaded properly.
 
+Often you want to use your own python code in your Airflow deployment,
+for example common code, libraries, you might want to generate DAGs using
+shared python code and have several DAG python files.
 
-Packages Loading in Python
---------------------------
+You can do it in one of those ways:
+
+* add your modules to one of the folders that airflow automatically adds to 
``PYTHONPATH``
+* add extra folders where you keep your code to ``PYTHONPATH``
+* package your code into Python package and install it together with Airflow.
+
+The chapter follows with general description of how Python loads packages and 
modules and dives
+deeper into the specific of each of the three possibilities above.
+
+How package/modules Loading in Python works

Review comment:
       ```suggestion
   How package/modules loading in Python works
   ```

##########
File path: docs/apache-airflow/modules_management.rst
##########
@@ -68,99 +81,192 @@ In the next section, you will learn how to create your own 
simple
 installable package and how to specify additional directories to be added
 to ``sys.path`` using the environment variable :envvar:`PYTHONPATH`.
 
+If you want to import some packages from a directory that is added to 
``PYTHONPATH`` you should import
+it following the full Python path of the files. All directories where you put 
your files have to also
+have an empty ``__init__.py`` file which turns it into Python package. Take as 
an example such structure
+as described below (the root directory which is on the ``PYTHONPATH`` might be 
any of the directories
+listed in the next chapter or those that you added manually to the path.
 
-Creating a package in Python
-----------------------------
+Typical structure of packages
+-----------------------------
 
-1. Before starting, install the following packages:
+This is an example structure that you might have in your ``dags`` folder (see 
below)
 
-``setuptools``: setuptools is a package development process library designed
-for creating and distributing Python packages.
+.. code-block:: none
 
-``wheel``: The wheel package provides a bdist_wheel command for setuptools. It
-creates .whl file which is directly installable through the ``pip install``
-command. We can then upload the same file to `PyPI <pypi.org>`_.
+   <DIRECTORY ON PYTHONPATH>
+   | .airflowignore  -- only needed in in ``dags`` folder, see below
+   | -- my_company
+                 | __init__.py
+                 | common_package
+                 |              |  __init__.py
+                 |              | common_module.py
+                 |              | subpackage
+                 |                         | __init__.py
+                 |                         | subpackaged_util_module.py
+                 |
+                 | my_custom_dags
+                                 | __init__.py
+                                 | my_dag_1.py
+                                 | my_dag_2.py
+                                 | base_dag.py
+
+In the case above, those are the ways you should import the python files:
 
-.. code-block:: bash
+.. code-block:: python
 
-    pip install --upgrade pip setuptools wheel
+   from my_company.common_package.common_module import SomeClass
+   from my_company.common_package.subpackge.subpackaged_util_module import 
AnotherClass
+   from my_company.my_custom_dags.base_dag import BaseDag
 
-2. Create the package directory - in our case, we will call it 
``airflow_operators``.
+You can see the ``.ariflowignore`` file at the root of your folder. This is a 
file that you can put in your
+``dags`` folder to tell Airflow which files from the 'dags` folder should be 
ignored when Airflow

Review comment:
       ```suggestion
   ``dags`` folder to tell Airflow which files from the ``dags`` folder should 
be ignored when the Airflow
   ```

##########
File path: docs/apache-airflow/modules_management.rst
##########
@@ -25,9 +25,21 @@ Airflow configuration. The following article will describe 
how you can
 create your own module so that Airflow can load it correctly, as well as
 diagnose problems when modules are not loaded properly.
 
+Often you want to use your own python code in your Airflow deployment,
+for example common code, libraries, you might want to generate DAGs using
+shared python code and have several DAG python files.
 
-Packages Loading in Python
---------------------------
+You can do it in one of those ways:
+
+* add your modules to one of the folders that airflow automatically adds to 
``PYTHONPATH``
+* add extra folders where you keep your code to ``PYTHONPATH``
+* package your code into Python package and install it together with Airflow.
+
+The chapter follows with general description of how Python loads packages and 
modules and dives
+deeper into the specific of each of the three possibilities above.

Review comment:
       ```suggestion
   The next chapter has a general description of how Python loads packages and 
modules, and dives
   deeper into the specifics of each of the three possibilities above.
   ```

##########
File path: docs/apache-airflow/modules_management.rst
##########
@@ -68,99 +81,192 @@ In the next section, you will learn how to create your own 
simple
 installable package and how to specify additional directories to be added
 to ``sys.path`` using the environment variable :envvar:`PYTHONPATH`.
 
+If you want to import some packages from a directory that is added to 
``PYTHONPATH`` you should import
+it following the full Python path of the files. All directories where you put 
your files have to also
+have an empty ``__init__.py`` file which turns it into Python package. Take as 
an example such structure
+as described below (the root directory which is on the ``PYTHONPATH`` might be 
any of the directories
+listed in the next chapter or those that you added manually to the path.
 
-Creating a package in Python
-----------------------------
+Typical structure of packages
+-----------------------------
 
-1. Before starting, install the following packages:
+This is an example structure that you might have in your ``dags`` folder (see 
below)
 
-``setuptools``: setuptools is a package development process library designed
-for creating and distributing Python packages.
+.. code-block:: none
 
-``wheel``: The wheel package provides a bdist_wheel command for setuptools. It
-creates .whl file which is directly installable through the ``pip install``
-command. We can then upload the same file to `PyPI <pypi.org>`_.
+   <DIRECTORY ON PYTHONPATH>
+   | .airflowignore  -- only needed in in ``dags`` folder, see below
+   | -- my_company
+                 | __init__.py
+                 | common_package
+                 |              |  __init__.py
+                 |              | common_module.py
+                 |              | subpackage
+                 |                         | __init__.py
+                 |                         | subpackaged_util_module.py
+                 |
+                 | my_custom_dags
+                                 | __init__.py
+                                 | my_dag_1.py
+                                 | my_dag_2.py
+                                 | base_dag.py
+
+In the case above, those are the ways you should import the python files:
 
-.. code-block:: bash
+.. code-block:: python
 
-    pip install --upgrade pip setuptools wheel
+   from my_company.common_package.common_module import SomeClass
+   from my_company.common_package.subpackge.subpackaged_util_module import 
AnotherClass
+   from my_company.my_custom_dags.base_dag import BaseDag
 
-2. Create the package directory - in our case, we will call it 
``airflow_operators``.
+You can see the ``.ariflowignore`` file at the root of your folder. This is a 
file that you can put in your
+``dags`` folder to tell Airflow which files from the 'dags` folder should be 
ignored when Airflow
+scheduler looks for DAGs. It should contain regular expressions for the paths 
that should be ignored. You
+do not need to have that file in any other folder in ``PYTHONPATH`` (and also 
you can only keep
+shared code in the other folders, not the actual DAGs).
 
-.. code-block:: bash
+In the example above the dags are only in ``my_custom_dags`` folder, the 
``common_package`` should not be
+scanned by scheduler when searching for DAGS, so we should ignore 
``common_package`` folder. You also
+want to ignore the ``base_dag`` if you keep a base DAG there that 
``my_dag1.py`` and ``my_dag1.py`` derives
+from. Your ``.airflowignore`` should look then like this:
 
-    mkdir airflow_operators
+.. code-block:: none
 
-3. Create the file ``__init__.py`` inside the package and add following code:
+   my_company/common_package/.*
+   my_company/my_custom_dags/base_dag\.py
 
-.. code-block:: python
+Built-in ``PYTHONPATH`` entries in Airflow
+------------------------------------------
 
-    print("Hello from airflow_operators")
+Airflow, when running dynamically adds three directories to the ``sys.path``:
 
-When we import this package, it should print the above message.
+- The ``dags`` folder: It is configured with option ``dags_folder`` in section 
``[core]``.
+- The ``config`` folder: It is configured by setting ``AIRFLOW_HOME`` variable 
(``{AIRFLOW_HOME}/config``) by default.
+- The ``plugins`` Folder: It is configured with option ``plugins_folder`` in 
section ``[core]``.
 
-4. Create ``setup.py``:
+.. note::
+   DAGS folder in Airflow 2 should not be shared with Webserver. While you can 
do it, unlike in Airflow 1.10
+   Airflow has no expectations that the DAGS folder is present for webserver. 
In fact it's a bit of
+   security risk to share ``dags`` folder with the webserver, because it means 
that people who write DAGS
+   can write code that webserver will be able to execute (And Airflow 2 
approach is that webserver should
+   never run code which can be modified by users who write DAGs). Therefore if 
you need to share some code
+   with Webserver, it is highly recommended that you share it via ``config`` 
or ``plugins`` folder or
+   via installed airflow packages (see below). Those folders are usually 
managed and accessible by different
+   users (Admins/DevOps) than DAG folders (those are usually data-scientists), 
so they are considered
+   as safe because they are part of configuration of Airflow installation that 
can be controlled by the
+   people managing the installation.
+
+Best practices for module loading
+---------------------------------
+
+There are a few watch-outs you should be careful about when you import your 
code.

Review comment:
       ```suggestion
   There are a few gotchas you should be careful about when you import your 
code.
   ```

##########
File path: docs/apache-airflow/modules_management.rst
##########
@@ -68,99 +81,192 @@ In the next section, you will learn how to create your own 
simple
 installable package and how to specify additional directories to be added
 to ``sys.path`` using the environment variable :envvar:`PYTHONPATH`.
 
+If you want to import some packages from a directory that is added to 
``PYTHONPATH`` you should import
+it following the full Python path of the files. All directories where you put 
your files have to also
+have an empty ``__init__.py`` file which turns it into Python package. Take as 
an example such structure
+as described below (the root directory which is on the ``PYTHONPATH`` might be 
any of the directories
+listed in the next chapter or those that you added manually to the path.

Review comment:
       ```suggestion
   as described below (the root directory which is on the ``PYTHONPATH``).
   ```
   
   I'm not sure what you meant by the portion I've removed. Maybe rephrase if 
it was important?

##########
File path: docs/apache-airflow/modules_management.rst
##########
@@ -208,71 +314,101 @@ value as shown below:
     Python PATH: 
[/home/arch/venv/bin:/home/arch/projects/airflow_operators:/usr/lib/python38.zip:/usr/lib/python3.8:/usr/lib/python3.8/lib-dynload:/home/arch/venv/lib/python3.8/site-packages:/home/arch/airflow/dags:/home/arch/airflow/config:/home/arch/airflow/plugins]
 
 
-.. _additional-modules-in-airflow:
+Creating a package in Python
+----------------------------
 
-Additional modules in Airflow
------------------------------
+This is most organized way of adding your custom code. Thanks to using the 
packages,

Review comment:
       ```suggestion
   This is most organized way of adding your custom code. Thanks to using 
packages,
   ```

##########
File path: docs/apache-airflow/modules_management.rst
##########
@@ -68,99 +81,192 @@ In the next section, you will learn how to create your own 
simple
 installable package and how to specify additional directories to be added
 to ``sys.path`` using the environment variable :envvar:`PYTHONPATH`.
 
+If you want to import some packages from a directory that is added to 
``PYTHONPATH`` you should import
+it following the full Python path of the files. All directories where you put 
your files have to also
+have an empty ``__init__.py`` file which turns it into Python package. Take as 
an example such structure
+as described below (the root directory which is on the ``PYTHONPATH`` might be 
any of the directories
+listed in the next chapter or those that you added manually to the path.
 
-Creating a package in Python
-----------------------------
+Typical structure of packages
+-----------------------------
 
-1. Before starting, install the following packages:
+This is an example structure that you might have in your ``dags`` folder (see 
below)
 
-``setuptools``: setuptools is a package development process library designed
-for creating and distributing Python packages.
+.. code-block:: none
 
-``wheel``: The wheel package provides a bdist_wheel command for setuptools. It
-creates .whl file which is directly installable through the ``pip install``
-command. We can then upload the same file to `PyPI <pypi.org>`_.
+   <DIRECTORY ON PYTHONPATH>
+   | .airflowignore  -- only needed in in ``dags`` folder, see below
+   | -- my_company
+                 | __init__.py
+                 | common_package
+                 |              |  __init__.py
+                 |              | common_module.py
+                 |              | subpackage
+                 |                         | __init__.py
+                 |                         | subpackaged_util_module.py
+                 |
+                 | my_custom_dags
+                                 | __init__.py
+                                 | my_dag_1.py
+                                 | my_dag_2.py
+                                 | base_dag.py
+
+In the case above, those are the ways you should import the python files:
 
-.. code-block:: bash
+.. code-block:: python
 
-    pip install --upgrade pip setuptools wheel
+   from my_company.common_package.common_module import SomeClass
+   from my_company.common_package.subpackge.subpackaged_util_module import 
AnotherClass
+   from my_company.my_custom_dags.base_dag import BaseDag
 
-2. Create the package directory - in our case, we will call it 
``airflow_operators``.
+You can see the ``.ariflowignore`` file at the root of your folder. This is a 
file that you can put in your
+``dags`` folder to tell Airflow which files from the 'dags` folder should be 
ignored when Airflow
+scheduler looks for DAGs. It should contain regular expressions for the paths 
that should be ignored. You
+do not need to have that file in any other folder in ``PYTHONPATH`` (and also 
you can only keep
+shared code in the other folders, not the actual DAGs).
 
-.. code-block:: bash
+In the example above the dags are only in ``my_custom_dags`` folder, the 
``common_package`` should not be
+scanned by scheduler when searching for DAGS, so we should ignore 
``common_package`` folder. You also
+want to ignore the ``base_dag`` if you keep a base DAG there that 
``my_dag1.py`` and ``my_dag1.py`` derives
+from. Your ``.airflowignore`` should look then like this:
 
-    mkdir airflow_operators
+.. code-block:: none
 
-3. Create the file ``__init__.py`` inside the package and add following code:
+   my_company/common_package/.*
+   my_company/my_custom_dags/base_dag\.py
 
-.. code-block:: python
+Built-in ``PYTHONPATH`` entries in Airflow
+------------------------------------------
 
-    print("Hello from airflow_operators")
+Airflow, when running dynamically adds three directories to the ``sys.path``:
 
-When we import this package, it should print the above message.
+- The ``dags`` folder: It is configured with option ``dags_folder`` in section 
``[core]``.
+- The ``config`` folder: It is configured by setting ``AIRFLOW_HOME`` variable 
(``{AIRFLOW_HOME}/config``) by default.
+- The ``plugins`` Folder: It is configured with option ``plugins_folder`` in 
section ``[core]``.
 
-4. Create ``setup.py``:
+.. note::
+   DAGS folder in Airflow 2 should not be shared with Webserver. While you can 
do it, unlike in Airflow 1.10
+   Airflow has no expectations that the DAGS folder is present for webserver. 
In fact it's a bit of
+   security risk to share ``dags`` folder with the webserver, because it means 
that people who write DAGS
+   can write code that webserver will be able to execute (And Airflow 2 
approach is that webserver should
+   never run code which can be modified by users who write DAGs). Therefore if 
you need to share some code
+   with Webserver, it is highly recommended that you share it via ``config`` 
or ``plugins`` folder or
+   via installed airflow packages (see below). Those folders are usually 
managed and accessible by different
+   users (Admins/DevOps) than DAG folders (those are usually data-scientists), 
so they are considered
+   as safe because they are part of configuration of Airflow installation that 
can be controlled by the
+   people managing the installation.
+
+Best practices for module loading
+---------------------------------
+
+There are a few watch-outs you should be careful about when you import your 
code.
+
+Use unique top package name
+...........................
+
+It is recommended that you always put your dags/common files in a subpackage 
which is unique to your
+deployment (``my_company`` in the example below). It is far too easy to use 
generic names for the
+folders that will clash with other packages already present in the system. For 
example if you
+create ``airflow/operators`` subfolder it will not be accessible because 
Airflow already has a package
+named ``airflow.operators`` and it will look there when importing ``from 
airflow.operators``
+
+Don't use relative imports
+..........................
+
+Never use relative imports (starting with ``.``) that were added in Python 3.
+
+This is tempting to do something like that it in ``my_dag1.py``:
 
 .. code-block:: python
 
-    import setuptools
+   from .base_dag import BaseDag  # NEVER DO THAT!!!!
 
-    setuptools.setup(
-        name="airflow_operators",
-    )
+You should import such shared dag using full path (starting from the directory 
which is added to
+``PYTHONPATH``:
 
-5. Build the wheel:
+.. code-block:: python
 
-.. code-block:: bash
+   from my_company.my_custom_dags.base_dag import BaseDag  # This is cool
 
-    python setup.py bdist_wheel
+The relative imports are counter-intuitive, depending on how you start your 
python code, they behave
+differently. In Airflow the same DAG file might be parsed in different context 
(by scheduler, by worker
+or during the tests) and in those cases, relatives imports might behave 
differently. Always use full
+python package path when you import anything in Airflow DAGs, this will save 
you a lot of troubles.
+You can read more about relative import caveats in
+`this Stack Overflow thread <https://stackoverflow.com/q/16981921/516701>`_
 
-This will create a few directories in the project and the overall structure 
will
-look like following:
+Add ``__init__.py`` in package folders
+......................................
 
-.. code-block:: bash
+When you create folders you should add ``__init__.py`` file as empty files in 
your folders. While in Python 3
+there is a concept of implicit namespaces where you do not have to add those 
files to folder, Airflow
+expects that the files are added to all packages you added.

Review comment:
       Curious, what breaks with this? My simple toy examples work without 
`__init__.py`'s?

##########
File path: docs/apache-airflow/modules_management.rst
##########
@@ -208,71 +314,101 @@ value as shown below:
     Python PATH: 
[/home/arch/venv/bin:/home/arch/projects/airflow_operators:/usr/lib/python38.zip:/usr/lib/python3.8:/usr/lib/python3.8/lib-dynload:/home/arch/venv/lib/python3.8/site-packages:/home/arch/airflow/dags:/home/arch/airflow/config:/home/arch/airflow/plugins]
 
 
-.. _additional-modules-in-airflow:
+Creating a package in Python
+----------------------------
 
-Additional modules in Airflow
------------------------------
+This is most organized way of adding your custom code. Thanks to using the 
packages,
+you might organize your versioning approach, control which versions of the 
shared code are installed
+and deploy the code to all your instances and containers in controlled way - 
all by system admins/DevOps
+rather than by the DAG writers. It is usually suitable when you have separate 
team that manages this

Review comment:
       ```suggestion
   rather than by the DAG writers. It is usually suitable when you have a 
separate team that manages this
   ```

##########
File path: docs/apache-airflow/modules_management.rst
##########
@@ -68,99 +81,192 @@ In the next section, you will learn how to create your own 
simple
 installable package and how to specify additional directories to be added
 to ``sys.path`` using the environment variable :envvar:`PYTHONPATH`.
 
+If you want to import some packages from a directory that is added to 
``PYTHONPATH`` you should import
+it following the full Python path of the files. All directories where you put 
your files have to also
+have an empty ``__init__.py`` file which turns it into Python package. Take as 
an example such structure
+as described below (the root directory which is on the ``PYTHONPATH`` might be 
any of the directories
+listed in the next chapter or those that you added manually to the path.
 
-Creating a package in Python
-----------------------------
+Typical structure of packages
+-----------------------------
 
-1. Before starting, install the following packages:
+This is an example structure that you might have in your ``dags`` folder (see 
below)
 
-``setuptools``: setuptools is a package development process library designed
-for creating and distributing Python packages.
+.. code-block:: none
 
-``wheel``: The wheel package provides a bdist_wheel command for setuptools. It
-creates .whl file which is directly installable through the ``pip install``
-command. We can then upload the same file to `PyPI <pypi.org>`_.
+   <DIRECTORY ON PYTHONPATH>
+   | .airflowignore  -- only needed in in ``dags`` folder, see below
+   | -- my_company
+                 | __init__.py
+                 | common_package
+                 |              |  __init__.py
+                 |              | common_module.py
+                 |              | subpackage
+                 |                         | __init__.py
+                 |                         | subpackaged_util_module.py
+                 |
+                 | my_custom_dags
+                                 | __init__.py
+                                 | my_dag_1.py
+                                 | my_dag_2.py
+                                 | base_dag.py
+
+In the case above, those are the ways you should import the python files:
 
-.. code-block:: bash
+.. code-block:: python
 
-    pip install --upgrade pip setuptools wheel
+   from my_company.common_package.common_module import SomeClass
+   from my_company.common_package.subpackge.subpackaged_util_module import 
AnotherClass
+   from my_company.my_custom_dags.base_dag import BaseDag
 
-2. Create the package directory - in our case, we will call it 
``airflow_operators``.
+You can see the ``.ariflowignore`` file at the root of your folder. This is a 
file that you can put in your
+``dags`` folder to tell Airflow which files from the 'dags` folder should be 
ignored when Airflow
+scheduler looks for DAGs. It should contain regular expressions for the paths 
that should be ignored. You
+do not need to have that file in any other folder in ``PYTHONPATH`` (and also 
you can only keep
+shared code in the other folders, not the actual DAGs).
 
-.. code-block:: bash
+In the example above the dags are only in ``my_custom_dags`` folder, the 
``common_package`` should not be
+scanned by scheduler when searching for DAGS, so we should ignore 
``common_package`` folder. You also
+want to ignore the ``base_dag`` if you keep a base DAG there that 
``my_dag1.py`` and ``my_dag1.py`` derives
+from. Your ``.airflowignore`` should look then like this:
 
-    mkdir airflow_operators
+.. code-block:: none
 
-3. Create the file ``__init__.py`` inside the package and add following code:
+   my_company/common_package/.*
+   my_company/my_custom_dags/base_dag\.py
 
-.. code-block:: python
+Built-in ``PYTHONPATH`` entries in Airflow
+------------------------------------------
 
-    print("Hello from airflow_operators")
+Airflow, when running dynamically adds three directories to the ``sys.path``:
 
-When we import this package, it should print the above message.
+- The ``dags`` folder: It is configured with option ``dags_folder`` in section 
``[core]``.
+- The ``config`` folder: It is configured by setting ``AIRFLOW_HOME`` variable 
(``{AIRFLOW_HOME}/config``) by default.
+- The ``plugins`` Folder: It is configured with option ``plugins_folder`` in 
section ``[core]``.
 
-4. Create ``setup.py``:
+.. note::
+   DAGS folder in Airflow 2 should not be shared with Webserver. While you can 
do it, unlike in Airflow 1.10
+   Airflow has no expectations that the DAGS folder is present for webserver. 
In fact it's a bit of
+   security risk to share ``dags`` folder with the webserver, because it means 
that people who write DAGS
+   can write code that webserver will be able to execute (And Airflow 2 
approach is that webserver should
+   never run code which can be modified by users who write DAGs). Therefore if 
you need to share some code
+   with Webserver, it is highly recommended that you share it via ``config`` 
or ``plugins`` folder or
+   via installed airflow packages (see below). Those folders are usually 
managed and accessible by different
+   users (Admins/DevOps) than DAG folders (those are usually data-scientists), 
so they are considered
+   as safe because they are part of configuration of Airflow installation that 
can be controlled by the
+   people managing the installation.
+
+Best practices for module loading
+---------------------------------
+
+There are a few watch-outs you should be careful about when you import your 
code.
+
+Use unique top package name
+...........................
+
+It is recommended that you always put your dags/common files in a subpackage 
which is unique to your
+deployment (``my_company`` in the example below). It is far too easy to use 
generic names for the
+folders that will clash with other packages already present in the system. For 
example if you
+create ``airflow/operators`` subfolder it will not be accessible because 
Airflow already has a package
+named ``airflow.operators`` and it will look there when importing ``from 
airflow.operators``
+
+Don't use relative imports
+..........................
+
+Never use relative imports (starting with ``.``) that were added in Python 3.
+
+This is tempting to do something like that it in ``my_dag1.py``:
 
 .. code-block:: python
 
-    import setuptools
+   from .base_dag import BaseDag  # NEVER DO THAT!!!!
 
-    setuptools.setup(
-        name="airflow_operators",
-    )
+You should import such shared dag using full path (starting from the directory 
which is added to
+``PYTHONPATH``:
 
-5. Build the wheel:
+.. code-block:: python
 
-.. code-block:: bash
+   from my_company.my_custom_dags.base_dag import BaseDag  # This is cool
 
-    python setup.py bdist_wheel
+The relative imports are counter-intuitive, depending on how you start your 
python code, they behave
+differently. In Airflow the same DAG file might be parsed in different context 
(by scheduler, by worker
+or during the tests) and in those cases, relatives imports might behave 
differently. Always use full
+python package path when you import anything in Airflow DAGs, this will save 
you a lot of troubles.
+You can read more about relative import caveats in
+`this Stack Overflow thread <https://stackoverflow.com/q/16981921/516701>`_

Review comment:
       ```suggestion
   The relative imports are counter-intuitive, and depending on how you start 
your python code, they can behave
   differently. In Airflow the same DAG file might be parsed in different 
contexts (by schedulers, by workers
   or during tests) and in those cases, relatives imports might behave 
differently. Always use full
   python package paths when you import anything in Airflow DAGs, this will 
save you a lot of troubles.
   You can read more about relative import caveats in
   `this Stack Overflow thread <https://stackoverflow.com/q/16981921/516701>`_.
   ```

##########
File path: docs/apache-airflow/modules_management.rst
##########
@@ -68,99 +81,192 @@ In the next section, you will learn how to create your own 
simple
 installable package and how to specify additional directories to be added
 to ``sys.path`` using the environment variable :envvar:`PYTHONPATH`.
 
+If you want to import some packages from a directory that is added to 
``PYTHONPATH`` you should import
+it following the full Python path of the files. All directories where you put 
your files have to also
+have an empty ``__init__.py`` file which turns it into Python package. Take as 
an example such structure
+as described below (the root directory which is on the ``PYTHONPATH`` might be 
any of the directories
+listed in the next chapter or those that you added manually to the path.
 
-Creating a package in Python
-----------------------------
+Typical structure of packages
+-----------------------------
 
-1. Before starting, install the following packages:
+This is an example structure that you might have in your ``dags`` folder (see 
below)
 
-``setuptools``: setuptools is a package development process library designed
-for creating and distributing Python packages.
+.. code-block:: none
 
-``wheel``: The wheel package provides a bdist_wheel command for setuptools. It
-creates .whl file which is directly installable through the ``pip install``
-command. We can then upload the same file to `PyPI <pypi.org>`_.
+   <DIRECTORY ON PYTHONPATH>
+   | .airflowignore  -- only needed in in ``dags`` folder, see below

Review comment:
       ```suggestion
      | .airflowignore  -- only needed in ``dags`` folder, see below
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to