jonathanjuursema commented on issue #28010:
URL: https://github.com/apache/airflow/issues/28010#issuecomment-1351541037

   I've spent some time playing with our set-up to tackle some of the 
question/challenges you set out. I have the following observations:
   
   **Is the configuration the same between the worker, webserver and 
scheduler?**
   Yes. As mentioned, we deploy Airflow in a containerized setting, and all 
containers (webserver, scheduler and worker) are all provided environment 
variables from (mostly) the same central source. To doublecheck, I've ran the 
following command in all three containers:
   
   ```bash
   printenv | grep AIRFLOW; printenv | grep REDIS; printenv | grep CELERY
   ```
   Sorted and compared the output in Excel (not by eye, but by writing a bunch 
of _if this cell equals that cell_ statements) and I am 100% sure all 
containers run the exact same environment config.
   
   **Can you make sure you are actually loading the intended configuration?**
   I did the following. I've updated the 
`/opt/airflow/config/retail_celery_config.py` I've discussed in my previous 
comment like this (note the broker URL):
   
   ```python
   from airflow.config_templates.default_celery import DEFAULT_CELERY_CONFIG
   import os
   
   CELERY_CONFIG = {
       **DEFAULT_CELERY_CONFIG,
       'broker_url': 'banana',
       'broker_transport_options': {
           'password': os.getenv('REDIS_BROKER_MASTER_PASSWORD'),
           'master_name': os.getenv('REDIS_BROKER_MASTER_NAME')
       }
   }
   ```
   
   If I deploy this way, I'm observing the following:
   
   The webserver and scheduler don't show anything weird in their logging. 
Their stdout looks fine, scheduler stderr is empty, and webserver stderr is 
below. I don't think that is related.
   ```
   
/home/airflow/.local/lib/python3.10/site-packages/azure/storage/common/_connection.py:82
 SyntaxWarning: "is" with a literal. Did you mean "=="?
   [2022-12-14 14:09:29 +0000] [30] [INFO] Starting gunicorn 20.1.0
   [2022-12-14 14:09:29 +0000] [30] [INFO] Listening at: http://0.0.0.0:8080 
(30)
   [2022-12-14 14:09:29 +0000] [30] [INFO] Using worker: sync
   [2022-12-14 14:09:29 +0000] [46] [INFO] Booting worker with pid: 46
   [2022-12-14 14:09:29 +0000] [47] [INFO] Booting worker with pid: 47
   [2022-12-14 14:09:29 +0000] [48] [INFO] Booting worker with pid: 48
   [2022-12-14 14:09:29 +0000] [49] [INFO] Booting worker with pid: 49
   ```
   
   The worker, however, shows the following stdout:
   ```
    -------------- celery@f616d2ff89b0 v5.2.7 (dawn-chorus)
   --- ***** ----- 
   -- ******* ---- Linux-5.18.0-0.deb11.4-amd64-x86_64-with-glibc2.31 
2022-12-14 14:08:30
   - *** --- * --- 
   - ** ---------- [config]
   - ** ---------- .> app:         
airflow.executors.celery_executor:0x7f29e6748ac0
   - ** ---------- .> transport:   amqp://guest:**@banaan:5672//
   - ** ---------- .> results:     mysql://xxx:**@xxx:3306/xxx
   - *** --- * --- .> concurrency: 16 (prefork)
   -- ******* ---- .> task events: OFF (enable -E to monitor tasks in this 
worker)
   --- ***** ----- 
    -------------- [queues]
                   .> default          exchange=default(direct) key=default
   ```
   
   And the following in stderr:
   ```
   [2022-12-14 14:17:24,067: ERROR/MainProcess] consumer: Cannot connect to 
amqp://guest:**@banaan:5672//: [Errno -2] Name or service not known.
   Trying again in 32.00 seconds... (16/100)
   
   [2022-12-14 14:17:56,098: ERROR/MainProcess] consumer: Cannot connect to 
amqp://guest:**@banaan:5672//: [Errno -2] Name or service not known.
   Trying again in 32.00 seconds... (16/100)
   
   [2022-12-14 14:18:28,125: ERROR/MainProcess] consumer: Cannot connect to 
amqp://guest:**@banaan:5672//: [Errno -2] Name or service not known.
   Trying again in 32.00 seconds... (16/100)
   
   [2022-12-14 14:19:00,160: ERROR/MainProcess] consumer: Cannot connect to 
amqp://guest:**@banaan:5672//: [Errno -2] Name or service not known.
   Trying again in 32.00 seconds... (16/100)
   
   [2022-12-14 14:19:32,189: ERROR/MainProcess] consumer: Cannot connect to 
amqp://guest:**@banaan:5672//: [Errno -2] Name or service not known.
   Trying again in 32.00 seconds... (16/100)
   
   [2022-12-14 14:20:04,217: ERROR/MainProcess] consumer: Cannot connect to 
amqp://guest:**@banaan:5672//: [Errno -2] Name or service not known.
   Trying again in 32.00 seconds... (16/100)
   ```
   
   This suggests to me that _at least the worker_ is picking up the custom 
config.
   
   **Other observations.**
   
   This makes me wonder, if I set the Redis config to something bogus, how come 
the webserver and and scheduler don't complain?
   
   In order to investigate this I set `AIRFLOW__WEBSERVER__EXPOSE_CONFIG=true` 
(`AIRFLOW__LOGGING__LOGGING_LEVEL=DEBUG` has already been on since this 
experiment).
   
   Now I can observe the configuration in the Airflow webinterface. This page 
has two sections. `/opt/airflow/airflow.cfg` shows the Airflow config file. 
This  is just the default file. We don't specify this file, so we're using the 
one that comes the upstream Airflow container.
   
   Under `Running Configuration` we can see the actual running configuration, 
and here I see something interesting:
   | Section | Key | Value | Source |
   | --- | --- | --- | --- |
   | celery | broker_url | redis://redis:6379/0 | airflow.cfg |
   | celery | celery_config_options | retail_celery_config.CELERY_CONFIG | env 
var |
   
   It loads our reference custom celery config dict (as discussed earlier) from 
the env var. However, it also loads the `broker_url` from the `airflow.cfg` 
config file. Somehow, the worker looks like to use the one from our custom 
config dict (since the logging clearly shows the test string there). The 
webserver and scheduler, I think, are falling back to the default broker URL 
from the `airflow.cfg` (or at least seem to ignore our custom dict). They don't 
show any connection errors however (I've shared the logs above, the stdout logs 
don't reference the test string anywhere, nor an indication there's something 
wrong). According to [the 
docs](https://airflow.apache.org/docs/apache-airflow/stable/howto/set-config.html),
 I'd think that the environment variable file should get priority. I'm not sure 
why (if `redis://redis:6379/0` does not exist) the webserver and scheduler seem 
to work fine.
   
   I've also searched in our log aggregator (the container UI is not the best 
one for investigating logs older than a few minutes) for the test string, and 
for the string `redis`. The first only shows log lines from the worker 
container (the ones I shared above), the second one shows the following:
   
   ```
   Date,Host,Service,Container Name,Message
   
"2022-12-14T13:49:07.140Z","""vmXXXX""","""airflow""","""airflow-init-5df407da-2388-dc6f-15be-78cec8708021""","[2022-12-14
 13:49:07,140] {providers_manager.py:433} DEBUG - Loading 
EntryPoint(name='provider_info', 
value='airflow.providers.redis.get_provider_info:get_provider_info', 
group='apache_airflow_provider') from package 
apache-airflow-providers-redis"
   
"2022-12-14T13:49:11.403Z","""vmXXXX""","""airflow""","""airflow-init-5df407da-2388-dc6f-15be-78cec8708021""","[2022-12-14
 13:49:11,402] {providers_manager.py:433} DEBUG - Loading 
EntryPoint(name='provider_info', 
value='airflow.providers.redis.get_provider_info:get_provider_info', 
group='apache_airflow_provider') from package 
apache-airflow-providers-redis"
   
"2022-12-14T13:49:37.692Z","""vmXXXX""","""airflow""","""airflow-webserver-5df407da-2388-dc6f-15be-78cec8708021""","[2022-12-14
 13:49:37,692] {providers_manager.py:433} DEBUG - Loading 
EntryPoint(name='provider_info', 
value='airflow.providers.redis.get_provider_info:get_provider_info', 
group='apache_airflow_provider') from package 
apache-airflow-providers-redis"
   
"2022-12-14T13:49:43.940Z","""vmXXXX""","""airflow""","""airflow-webserver-5df407da-2388-dc6f-15be-78cec8708021""","[2022-12-14
 13:49:43,940] {providers_manager.py:433} DEBUG - Loading 
EntryPoint(name='provider_info', 
value='airflow.providers.redis.get_provider_info:get_provider_info', 
group='apache_airflow_provider') from package 
apache-airflow-providers-redis"
   
"2022-12-14T13:49:47.090Z","""vmXXXX""","""airflow""","""airflow-webserver-5df407da-2388-dc6f-15be-78cec8708021""","[2022-12-14
 13:49:47,088] {providers_manager.py:433} DEBUG - Loading 
EntryPoint(name='provider_info', 
value='airflow.providers.redis.get_provider_info:get_provider_info', 
group='apache_airflow_provider') from package 
apache-airflow-providers-redis"
   
"2022-12-14T14:02:36.727Z","""vmXXXX""","""airflow""","""airflow-worker-8860332e-07cf-20dc-19b1-56c4ba462531""","File
 ""/home/airflow/.local/lib/python3.10/site-packages/redis/client.py"", line 
1378, in ping"
   
"2022-12-14T14:02:36.727Z","""vmXXXX""","""airflow""","""airflow-worker-8860332e-07cf-20dc-19b1-56c4ba462531""","File
 ""/home/airflow/.local/lib/python3.10/site-packages/redis/client.py"", line 
898, in execute_command"
   
"2022-12-14T14:02:36.727Z","""vmXXXX""","""airflow""","""airflow-worker-8860332e-07cf-20dc-19b1-56c4ba462531""","File
 ""/home/airflow/.local/lib/python3.10/site-packages/redis/connection.py"", 
line 1192, in get_connection"
   
"2022-12-14T14:02:36.727Z","""vmXXXX""","""airflow""","""airflow-worker-8860332e-07cf-20dc-19b1-56c4ba462531""","File
 ""/home/airflow/.local/lib/python3.10/site-packages/redis/sentinel.py"", line 
44, in connect"
   
"2022-12-14T14:02:36.727Z","""vmXXXX""","""airflow""","""airflow-worker-8860332e-07cf-20dc-19b1-56c4ba462531""","File
 ""/home/airflow/.local/lib/python3.10/site-packages/redis/sentinel.py"", line 
106, in get_master_address"
   
"2022-12-14T14:02:36.727Z","""vmXXXX""","""airflow""","""airflow-worker-8860332e-07cf-20dc-19b1-56c4ba462531""","File
 ""/home/airflow/.local/lib/python3.10/site-packages/redis/sentinel.py"", line 
219, in discover_master"
   
"2022-12-14T14:04:13.228Z","""vmXXXX""","""airflow""","""airflow-init-6686130f-cb74-c46b-2bff-a81c723030ea""","[2022-12-14
 14:04:13,227] {providers_manager.py:433} DEBUG - Loading 
EntryPoint(name='provider_info', 
value='airflow.providers.redis.get_provider_info:get_provider_info', 
group='apache_airflow_provider') from package 
apache-airflow-providers-redis"
   
"2022-12-14T14:04:17.405Z","""vmXXXX""","""airflow""","""airflow-init-6686130f-cb74-c46b-2bff-a81c723030ea""","[2022-12-14
 14:04:17,405] {providers_manager.py:433} DEBUG - Loading 
EntryPoint(name='provider_info', 
value='airflow.providers.redis.get_provider_info:get_provider_info', 
group='apache_airflow_provider') from package 
apache-airflow-providers-redis"
   
"2022-12-14T14:04:45.321Z","""vmXXXX""","""airflow""","""airflow-webserver-6686130f-cb74-c46b-2bff-a81c723030ea""","[2022-12-14
 14:04:45,320] {providers_manager.py:433} DEBUG - Loading 
EntryPoint(name='provider_info', 
value='airflow.providers.redis.get_provider_info:get_provider_info', 
group='apache_airflow_provider') from package 
apache-airflow-providers-redis"
   
"2022-12-14T14:04:52.018Z","""vmXXXX""","""airflow""","""airflow-webserver-6686130f-cb74-c46b-2bff-a81c723030ea""","[2022-12-14
 14:04:52,018] {providers_manager.py:433} DEBUG - Loading 
EntryPoint(name='provider_info', 
value='airflow.providers.redis.get_provider_info:get_provider_info', 
group='apache_airflow_provider') from package 
apache-airflow-providers-redis"
   
"2022-12-14T14:04:55.988Z","""vmXXXX""","""airflow""","""airflow-webserver-6686130f-cb74-c46b-2bff-a81c723030ea""","[2022-12-14
 14:04:55,987] {providers_manager.py:433} DEBUG - Loading 
EntryPoint(name='provider_info', 
value='airflow.providers.redis.get_provider_info:get_provider_info', 
group='apache_airflow_provider') from package 
apache-airflow-providers-redis"
   
"2022-12-14T14:08:44.904Z","""vmXXXX""","""airflow""","""airflow-init-b75ff09a-cfc0-dad0-0ece-8d3bdcda9553""","[2022-12-14
 14:08:44,904] {providers_manager.py:433} DEBUG - Loading 
EntryPoint(name='provider_info', 
value='airflow.providers.redis.get_provider_info:get_provider_info', 
group='apache_airflow_provider') from package 
apache-airflow-providers-redis"
   
"2022-12-14T14:08:49.385Z","""vmXXXX""","""airflow""","""airflow-init-b75ff09a-cfc0-dad0-0ece-8d3bdcda9553""","[2022-12-14
 14:08:49,384] {providers_manager.py:433} DEBUG - Loading 
EntryPoint(name='provider_info', 
value='airflow.providers.redis.get_provider_info:get_provider_info', 
group='apache_airflow_provider') from package 
apache-airflow-providers-redis"
   
"2022-12-14T14:09:18.278Z","""vmXXXX""","""airflow""","""airflow-webserver-b75ff09a-cfc0-dad0-0ece-8d3bdcda9553""","[2022-12-14
 14:09:18,278] {providers_manager.py:433} DEBUG - Loading 
EntryPoint(name='provider_info', 
value='airflow.providers.redis.get_provider_info:get_provider_info', 
group='apache_airflow_provider') from package 
apache-airflow-providers-redis"
   
"2022-12-14T14:09:24.433Z","""vmXXXX""","""airflow""","""airflow-webserver-b75ff09a-cfc0-dad0-0ece-8d3bdcda9553""","[2022-12-14
 14:09:24,433] {providers_manager.py:433} DEBUG - Loading 
EntryPoint(name='provider_info', 
value='airflow.providers.redis.get_provider_info:get_provider_info', 
group='apache_airflow_provider') from package 
apache-airflow-providers-redis"
   
"2022-12-14T14:09:29.312Z","""vmXXXX""","""airflow""","""airflow-webserver-b75ff09a-cfc0-dad0-0ece-8d3bdcda9553""","[2022-12-14
 14:09:29,311] {providers_manager.py:433} DEBUG - Loading 
EntryPoint(name='provider_info', 
value='airflow.providers.redis.get_provider_info:get_provider_info', 
group='apache_airflow_provider') from package 
apache-airflow-providers-redis"
   
   ```
   
   Looking forward to your observations! Do let me know if there's any more 
information I can provide. :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to