potiuk commented on code in PR #36537: URL: https://github.com/apache/airflow/pull/36537#discussion_r1444000036
########## INSTALL: ########## @@ -1,118 +1,324 @@ # INSTALL / BUILD instructions for Apache Airflow -This is a generic installation method that requires a number of dependencies to be installed. +## Basic installation of Airflow from sources and development environment setup + +This is a generic installation method that requires minimum starndard tools to develop airflow and +test it in local virtual environment (using standard CPyhon installation and `pip`). Depending on your system you might need different prerequisites, but the following systems/prerequisites are known to work: -Linux (Debian Bullseye, Bookworm and Linux Mint Debbie): +Linux (Debian Bookworm): -sudo apt install build-essential python3-dev libsqlite3-dev openssl \ - sqlite default-libmysqlclient-dev libmysqlclient-dev postgresql + sudo apt install -y --no-install-recommends apt-transport-https apt-utils ca-certificates \ + curl dumb-init freetds-bin gosu krb5-user libgeos-dev \ + ldap-utils libsasl2-2 libsasl2-modules libxmlsec1 locales libffi8 libldap-2.5-0 libssl3 netcat-openbsd \ + lsb-release openssh-client python3-selinux rsync sasl2-bin sqlite3 sudo unixodbc -On Ubuntu 20.04 you may get an error of mariadb_config not found -and mysql_config not found. +You might need to install MariaDB development headers to build some of the dependencies -Install MariaDB development headers: -sudo apt-get install libmariadb-dev libmariadbclient-dev + sudo apt-get install libmariadb-dev libmariadbclient-dev -MacOS (Mojave/Catalina): +MacOS (Mojave/Catalina) you might need to to install XCode command line tools and brew and those packages: -brew install sqlite mysql postgresql + brew install sqlite mysql postgresql -# [required] fetch the tarball and untar the source move into the directory that was untarred. +## Downloading and installing Airflow from sources -# [optional] run Apache RAT (release audit tool) to validate license headers -# RAT docs here: https://creadur.apache.org/rat/. Requires Java and Apache Rat -java -jar apache-rat.jar -E ./.rat-excludes -d . +While you can get Airflow sources in various ways (including cloning https://github.com/apache/airflow/), the +canonical way to download it is to fetch the tarball published at https://downloads.apache.org where you can +also verify checksum, signatures of the downloaded file. You can then and un-tar the source move into the +directory that was un-tarred. -# [optional] Airflow pulls in quite a lot of dependencies in order -# to connect to other services. You might want to test or run Airflow -# from a virtual env to make sure those dependencies are separated -# from your system wide versions +When you download source packages from https://downloads.apache.org, you download sources of Airflow and +all providers separately, however when you clone the GitHub repository at https://github.com/apache/airflow/ +you get all sources in one place. This is the most convenient way to develop Airflow and Providers together. +otherwise you have to separately install Airflow and Providers from sources in the same environment, which +is not as convenient. -python3 -m venv PATH_TO_YOUR_VENV -source PATH_TO_YOUR_VENV/bin/activate +## Creating virtualenv -# [required] building and installing by pip (preferred) -pip install . +Airflow pulls in quite a lot of dependencies in order to connect to other services. You generally want to +test or run Airflow from a virtual env to make sure those dependencies are separated from your system +wide versions. Using system-installed Python installation is strongly discouraged as the versions of Python +shipped with operating system often have a number of limitations and are not up to date. It is recommended +to install Python using either https://www.python.org/downloads/ or other tools that use them. See later +for description of `Hatch` as one of the tools that is Airflow's tool of choice to build Airflow packages. -# or directly -python setup.py install +Once you have a suitable Python version installed, you can create a virtualenv and activate it: -# You can also install recommended version of the dependencies by using -# constraint-python<PYTHON_MAJOR_MINOR_VERSION>.txt files as constraint file. This is needed in case -# you have problems with installing the current requirements from PyPI. -# There are different constraint files for different python versions. For example" + python3 -m venv PATH_TO_YOUR_VENV + source PATH_TO_YOUR_VENV/bin/activate -pip install . \ - --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-main/constraints-3.8.txt" +## Installing airflow locally +Installing airflow locally can be done using pip - note that this will install "development" version of +Airflow, where all providers are installed from local sources (if they are available), not from `pypi`. +It will also not include pre-installed providers installed from PyPI. In case you install from sources of +just Airflow, you need to install separately each provider that you want to develop. In case you install +from GitHub repository, all the current providers are available after installing Airflow. -By default `pip install` in Airflow 2.0 installs only the provider packages that are needed by the extras and -install them as packages from PyPI rather than from local sources: + pip install . -pip install .[google,amazon] \ - --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-main/constraints-3.8.txt" +If you develop Airflow and iterate on it you should install it in editable mode (with -e) flag and then +you do not need to re-install it after each change to sources. This is useful if you want to develop and +iterate on Airflow and Providers (together) if you install sources from cloned GitHub repository. + pip install -e . -You can upgrade just airflow, without paying attention to provider's dependencies by using 'constraints-no-providers' -constraint files. This allows you to keep installed provider packages. -pip install . --upgrade \ - --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-main/constraints-no-providers-3.8.txt" +You can also install optional packages that are needed to run certain tests. In case of local installation +for example you can install all prerequisites for google provider, tests and +all hadoop providers with this command: + + pip install -e ".[editable_google,devel_tests,devel_hadoop]" + + +or you can install all packages needed to run tests for core airflow: + + pip install -e ".[devel]" + +or you can install all packages needed to run tests for core, providers and all extensions of airflow: + + pip install -e ".[devel_all]" + +You can see the list of all available extras below. + +# Using Hatch to manage your Python, virtualenvs and build packages + +Airflow uses [hatch](https://hatch.pypa.io/) as a build and development tool of choice. It is one of popular +build tools and environment managers for Python, maintained by the Python Packaging Authority. +It is an optional tool that is only really needed when you want to build packages from sources, but +it is also very convenient to manage your Python versions and virtualenvs. + +Airflow project contains some pre-defined virtualenv definitions in ``pyproject.toml`` that can be +easily used by hatch to create your local venvs. This is not necessary for you to develop and test +Airflow, but it is a convenient way to manage your local Python versions and virtualenvs. + +## Installing Hatch + +You can install hat using various other ways (including Gui installers). + +Example using `pipx`: + + pipx install hatch + +We recommend using `pipx` as you can manage installed Python apps easily and later use it +to upgrade `hatch` easily as needed with: + + pipx upgrade hatch + +## Using Hatch to manage your Python versions + +You can also use hatch to install and manage airflow virtualenvs and development +environments. For example, you can install Python 3.10 with this command: + + hatch python install 3.10 + +or install all Python versions that are used in Airflow: + + hatch python install all + +## Using Hatch to manage your virtualenvs + +Airflow has some pre-defined virtualenvs that you can use to develop and test airflow. +You can see the list of available envs with: + + hatch show env + +This is what it shows currently: + +┏━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ +┃ Name ┃ Type ┃ Dependencies ┃ Description ┃ +┡━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ +│ default │ virtual │ apache-airflow[devel] │ Default environment with Python 3.8 for maximum compatibility │ +├─────────────┼─────────┼───────────────────────┼───────────────────────────────────────────────────────────────┤ +│ airflow-38 │ virtual │ apache-airflow[devel] │ Environment with Python 3.8 │ +├─────────────┼─────────┼───────────────────────┼───────────────────────────────────────────────────────────────┤ +│ airflow-39 │ virtual │ apache-airflow[devel] │ Environment with Python 3.9 │ +├─────────────┼─────────┼───────────────────────┼───────────────────────────────────────────────────────────────┤ +│ airflow-310 │ virtual │ apache-airflow[devel] │ Environment with Python 3.10 │ +├─────────────┼─────────┼───────────────────────┼───────────────────────────────────────────────────────────────┤ +│ airflow-311 │ virtual │ apache-airflow[devel] │ Environment with Python 3.11 │ +└─────────────┴─────────┴───────────────────────┴───────────────────────────────────────────────────────────────┘ + +The default env (if you have not used one explicitly) is `default` and it is a Python 3.8 +virtualenv for maximum compatibility. The default extra with which the environment is create is "devel", which +should be enough to develop and test basic airflow core tests. You can create the default environment with: + + hatch env create + +You can create specific environment by using them in create command: + + hatch env create airflow-310 + +You can install extras in the environment by running pip command: + + hatch -e airflow-310 run -- pip install ".[editable_google]" + +And you can enter the environment with running a shell of your choice (for example zsh) where you +can run any commands + + hatch -e airflow-310 shell + +You can also see where hatch created the virtualenvs and use it in your IDE or activate it manually: + + hatch env find airflow-310 + +You will get path similar to: + + /Users/jarek/Library/Application Support/hatch/env/virtual/apache-airflow/TReRdyYt/apache-airflow + +Then you will find `python` binary and `activate` script in the `bin` sub-folder of this directory and +you can configure your IDE to use this python virtualenv if you want to use that environment in your IDE. + +You can also set default environment by HATCH_ENV environment variable. +You can clean the env by running: -You can also install airflow in "editable mode" (with -e) flag and then provider packages are -available directly from the sources (and the provider packages installed from PyPI are UNINSTALLED in -order to avoid having providers in two places. And `provider.yaml` files are used to discover capabilities -of the providers which are part of the airflow source code. + hatch env prune -You can read more about `provider.yaml` and community-managed providers in -https://airflow.apache.org/docs/apache-airflow-providers/index.html for developing custom providers -and in ``CONTRIBUTING.rst`` for developing community maintained providers. +More information about hatch can be found in https://hatch.pypa.io/1.9/environment/ -This is useful if you want to develop providers: +## Using Hatch to build your packages + +You can use hatch to build installable package from the airflow sources. Such package will +include all metadata that is configured in `pyproject.toml` and will be installable with pip. + +The packages will have pre-installed dependencies for providers that are always +installed when Airflow is installed from PyPI. By default both `wheel` and `sdist` packages are built. + + hatch build + +You can also build only `wheel` or `sdist` packages: + + hatch build -t wheel + hatch build -t sdist + +## Installing recommended version of dependencies + +Whatever virtualenv solution you use, when you want to make sure you are using the same +version of dependencies as in main, you can install recommended version of the dependencies by using +constraint-python<PYTHON_MAJOR_MINOR_VERSION>.txt files as `constraint` file. This might be useful +to avoid "works-for-me" syndrome, where you use different version of dependencies than the ones +that are used in main, CI tests and by other contributors. + +There are different constraint files for different python versions. For example this command will install +all basic devel requirements and requirements of google provider as last successfully tested for Python 3.8: + + pip install -e ".[devel,editable_google]"" \ + --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-main/constraints-3.8.txt" + +You can upgrade just airflow, without paying attention to provider's dependencies by using +the 'constraints-no-providers' constraint files. This allows you to keep installed provider dependencies +and install to latest supported ones by pure airflow core. pip install -e . \ - --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-main/constraints-3.8.txt" - -You can also skip installing provider packages from PyPI by setting INSTALL_PROVIDERS_FROM_SOURCE to "true". -In this case Airflow will be installed in non-editable mode with all providers installed from the sources. -Additionally `provider.yaml` files will also be copied to providers folders which will make the providers -discoverable by Airflow even if they are not installed from packages in this case. - -INSTALL_PROVIDERS_FROM_SOURCES="true" pip install . \ - --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-main/constraints-3.8.txt" - -Airflow can be installed with extras to install some additional features (for example 'async' or 'doc' or -to install automatically providers and all dependencies needed by that provider: - -pip install .[async,google,amazon] \ - --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-main/constraints-3.8.txt" - -The list of available extras: - -# START EXTRAS HERE -aiobotocore, airbyte, alibaba, all, all_dbs, amazon, apache.atlas, apache.beam, apache.cassandra, -apache.drill, apache.druid, apache.flink, apache.hdfs, apache.hive, apache.impala, apache.kafka, -apache.kylin, apache.livy, apache.pig, apache.pinot, apache.spark, apache.webhdfs, apprise, -arangodb, asana, async, atlas, atlassian.jira, aws, azure, cassandra, celery, cgroups, cloudant, -cncf.kubernetes, cohere, common.io, common.sql, crypto, databricks, datadog, dbt.cloud, -deprecated_api, devel, devel_all, devel_ci, devel_hadoop, dingding, discord, doc, doc_gen, docker, -druid, elasticsearch, exasol, fab, facebook, ftp, gcp, gcp_api, github, github_enterprise, google, -google_auth, grpc, hashicorp, hdfs, hive, http, imap, influxdb, jdbc, jenkins, kerberos, kubernetes, -ldap, leveldb, microsoft.azure, microsoft.mssql, microsoft.psrp, microsoft.winrm, mongo, mssql, -mysql, neo4j, odbc, openai, openfaas, openlineage, opensearch, opsgenie, oracle, otel, pagerduty, -pandas, papermill, password, pgvector, pinecone, pinot, postgres, presto, rabbitmq, redis, s3, s3fs, -salesforce, samba, saml, segment, sendgrid, sentry, sftp, singularity, slack, smtp, snowflake, -spark, sqlite, ssh, statsd, tableau, tabular, telegram, trino, vertica, virtualenv, weaviate, -webhdfs, winrm, yandex, zendesk -# END EXTRAS HERE - -# For installing Airflow in development environments - see CONTRIBUTING.rst - -# COMPILING FRONT-END ASSETS (in case you see "Please make sure to build the frontend in static/ directory and then restart the server") -# Optional : Installing yarn - https://classic.yarnpkg.com/en/docs/install - -python setup.py compile_assets + --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-main/constraints-no-providers-3.8.txt" + +## All airflow extras + +Airflow has a number of extras that you can install to get additional dependencies. They sometimes install +providers, sometimes enable other features where packages are not installed by default. + +You can read more about those extras in the extras reference: +https://airflow.apache.org/docs/apache-airflow/stable/extra-packages-ref.html Review Comment: Because INSTALL is meant to be embedded in `source.tar.gz` package distributed via `dowloads.apache.org` - targetted for this installation option: https://airflow.apache.org/docs/apache-airflow/stable/installation/index.html#using-released-sources Whie `docs` packages are part of this package, You would really have to build the docs and host them locally to be able to view the docs in user-friendly, so for the convenience, we mention the actual docs where ASF hosts documentation. More context for the interested ones: The latest packages can be found here: https://downloads.apache.org/airflow/2.8.0/ This is a rarely used option by our users - most of them install airflow via `PyPI`, using .whl packages. But there are a number of users who wants to do it. For example there are are upstream package distributors - various linux distributions, conda, brew, etc. etc. etc. that are preparing their own packages from sources and host them in their repo. Also there are some users who are focused on provenance or simply want to build packages from pure sources + "official" tools. We had quite a few such users who communic And for the Apache Software Foundation - source packages which are signed, checksummed and released via dowloads.apache.org are the only "official" packages and the only "legally binding" packages that are released by the Foundation. All the other packages are "convenience" binaries and you should be able to build such convenience packages (for example .sdist, .whl, our docker reference images) only using sources released via `downloads` - so the source.tar.gz package should be "enough" to rebuild Airflow (having some common and renowned build tools available). So this is important to have INSTALL documentation that describes how to build Airflow, however since we are referring official Airflow documentation - it's easier to just let the user know where to look in the "official" docs rather than ask the user to look at the documentaiton that needs to be built and hosted locally. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
