[GitHub] [airflow] potiuk commented on a change in pull request #16170: Adding extra requirements for build and runtime of the PROD image.

GitBox Mon, 31 May 2021 10:19:32 -0700


potiuk commented on a change in pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#discussion_r642609374




##########
File path: docs/docker-stack/entrypoint.rst
##########
@@ -185,66 +259,28 @@ database and creating an ``admin/admin`` Admin user with 
the following command:
 The commands above perform initialization of the SQLite database, create admin 
user with admin password
 and Admin role. They also forward local port ``8080`` to the webserver port 
and finally start the webserver.
 
-Waits for celery broker connection
-----------------------------------
-
-In case Postgres or MySQL DB is used, and one of the ``scheduler``, 
``celery``, ``worker``, or ``flower``
-commands are used the entrypoint will wait until the celery broker DB 
connection is available.
-
-The script detects backend type depending on the URL schema and assigns 
default port numbers if not specified
-in the URL. Then it loops until connection to the host/port specified can be 
established
-It tries :envvar:`CONNECTION_CHECK_MAX_COUNT` times and sleeps 
:envvar:`CONNECTION_CHECK_SLEEP_TIME` between checks.
-To disable check, set ``CONNECTION_CHECK_MAX_COUNT=0``.
-
-Supported schemes:
-
-* ``amqp(s)://``  (rabbitmq) - default port 5672
-* ``redis://``               - default port 6379
-* ``postgres://``            - default port 5432
-* ``mysql://``               - default port 3306
-
-Waiting for connection involves checking if a matching port is open.
-The host information is derived from the variables 
:envvar:`AIRFLOW__CELERY__BROKER_URL` and
-:envvar:`AIRFLOW__CELERY__BROKER_URL_CMD`. If 
:envvar:`AIRFLOW__CELERY__BROKER_URL_CMD` variable
-is passed to the container, it is evaluated as a command to execute and result 
of this evaluation is used
-as :envvar:`AIRFLOW__CELERY__BROKER_URL`. The 
:envvar:`AIRFLOW__CELERY__BROKER_URL_CMD` variable
-takes precedence over the :envvar:`AIRFLOW__CELERY__BROKER_URL` variable.
+Installing additional requirements
+..................................
 
-.. _entrypoint:commands:
+Installing additional requirements can be done by specifying 
``_PIP_ADDITIONAL_REQUIREMENTS`` variable.
+The variable should contain a list of requirements that should be installed 
additionally when entering
+the containers. Note that this option slows down starting of Airflow as every 
time any container starts
+it must install new packages. Therefore this option should only be used for 
testing. When testing is
+finished, you should create your custom image with dependencies baked in.
 
-Executing commands
-------------------
+Not all dependencies can be installed this way. Dependencies that require 
compiling cannot be installed
+because they need ``build-essentials`` installed. In case you get compilation 
problem, you should revert
+to ``customizing image`` - this is the only good way to install dependencies 
that require compilation. 

Review comment:
       I think this is a very useful case to mention - for `minikube` and 
`kind` users. The users need to be aware of the options they have in different 
situations. We have to remember that this documentation is for different kinds 
of users (for example in the same PR we added _PIP_ADDITIONAL_REQUIREMENTS for 
those kind of users - which should never be considered as production use).
   
   If we add this, there is no reason we should not add the other. 
   
   I modified it a bit and spelled out minikube and kind explicitly.  I also 
thought a bit and added he Talos case as another register-less way.  It is 
really interesting how they implemented pass-through to the local docker 
cluster cache and I like it a lot - it's better than the `load` methods of kind 
and minikube - still providing register-less usage of locally built images. It 
allows even faster iterations (not mentioning the air-gaped use which is super 
important for some of our users as we've learned). It was cool I've learned 
that.
   
   So finally we have four methods - each for different purpose and with 
different requirements/dependencies.
   
   ```
      * For ``docker-compose`` deployment, that's all you need. The image is 
stored in docker engine cache
        and docker compose will use it from there.
   
      * For some - development targeted clusters - Kubernetes deployments you 
can load the images directly to
        Kubernetes clusters. Clusters such as `kind` or `minikube` have 
dedicated ``load`` method to load the
        images to the cluster.
   
     * In some cases (for example in `Talos 
<https://www.talos.dev/docs/v0.7/guides/configuring-pull-through-cache/#using-caching-registries-with-docker-local-cluster>`_)
       you can configure Kubernetes cluster to also use the local docker cache 
rather than remote registry - this is
       very similar as Docker-Compose case and it is often used in air-gaped 
systems to provide
       Kubernetes cluster access to container images.
   
     * Last but not least - you can push your image to a remote registry which 
is the most common way
       of storing and exposing the images, and it is most portable way of 
publishing the image. Both
       Docker-Compose and Kubernetes can make use of images exposed via 
registries.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [airflow] potiuk commented on a change in pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Reply via email to