potiuk commented on a change in pull request #10303: URL: https://github.com/apache/airflow/pull/10303#discussion_r476264200
########## File path: docs/modules_management.rst ########## @@ -0,0 +1,194 @@ + .. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + .. http://www.apache.org/licenses/LICENSE-2.0 + + .. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + + + +Modules Management +================== + +Airflow allows you to use your own Python modules in the DAG and in the Airflow configuration. The following article +will describe how you can create your own module so that Airflow can load it correctly, as well as diagnose problems +when modules are not loaded properly. + +This article is the last one for you if you need to adapt Airflow to the needs of your organization. + +Packages Loading in Python +-------------------------- + +The list of directories from which Python tries to load the module is given by the variable :any:`sys.path`. Python +really tries to `intelligently determine the contents of <https://stackoverflow.com/a/38403654>`_ of this variable, +including depending on the operating system and how Python is installed. + +You can check the contents of this variable for the current Python environment by running an interactive terminal as in +the example below: + +.. code-block:: pycon + + >>> import sys + >>> from pprint import pprint + >>> pprint(sys.path) + ['', + '/home/arch/.pyenv/versions/3.7.4/lib/python37.zip', + '/home/arch/.pyenv/versions/3.7.4/lib/python3.7', + '/home/arch/.pyenv/versions/3.7.4/lib/python3.7/lib-dynload', + '/home/arch/venvs/airflow/lib/python3.7/site-packages'] + +``sys.path`` is initialized during program startup. The first precedence is given to the current directory, +i.e, ``path[0]`` is the directory containing the current script that was used to invoke or an empty string in case +it was an interactive shell. Second precedence is given to the ``PYTHONPATH`` if provided, followed by installation-dependent +default paths which is managed by `site <https://docs.python.org/3/library/site.html#module-site>`_ module. + +``sys.path`` can also be modified during a Python session by simply using append +(for example, ``sys.path.append("/path/to/custom/package")``). Python will start searching for packages in the newer +paths once they're added. Airflow makes use of this feature as described in the further sections. + +In the variable ``sys.path`` there is a directory ``site-packages`` which contains the installed **external packages**, +which means you can install packages with ``pip`` or ``anaconda`` and you can use them in Airflow. In the next section, +you will learn how to create your own simple installable package and how to specify additional directories to be added +to ``sys.path`` using the environment variable :envvar:`PYTHONPATH`. + Review comment: I think it would be worth to add section about .pth files (https://docs.python.org/3/library/site.html#module-site) . I often find it invaluable (especially in production installation) to modularize access to different parts of code. Big organisations often have a lot of independent modules and components and often they are not installed by "pip" packages (for various reason - compilation needs, necessity to use code from sources etc.) and in those cases adding paths to search in .pth files is a really nice way of modularising such access. Then you need to just drop the .pth file in one of the site modules. The .pth has also the nice property that it can have an executable that it executed at every python interpreter start. It is also used in big packages that needs to be installed from sources (example ROS uses .pth files extensively http://wiki.ros.org/rospy) ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
