jonathanjuursema commented on issue #28010:
URL: https://github.com/apache/airflow/issues/28010#issuecomment-1351541037
I've spent some time playing with our set-up to tackle some of the
question/challenges you set out. I have the following observations:
**Is the configuration the same between the worker, webserver and
scheduler?**
Yes. As mentioned, we deploy Airflow in a containerized setting, and all
containers (webserver, scheduler and worker) are all provided environment
variables from (mostly) the same central source. To doublecheck, I've ran the
following command in all three containers:
```bash
printenv | grep AIRFLOW; printenv | grep REDIS; printenv | grep CELERY
```
Sorted and compared the output in Excel (not by eye, but by writing a bunch
of _if this cell equals that cell_ statements) and I am 100% sure all
containers run the exact same environment config.
**Can you make sure you are actually loading the intended configuration?**
I did the following. I've updated the
`/opt/airflow/config/retail_celery_config.py` I've discussed in my previous
comment like this (note the broker URL):
```python
from airflow.config_templates.default_celery import DEFAULT_CELERY_CONFIG
import os
CELERY_CONFIG = {
**DEFAULT_CELERY_CONFIG,
'broker_url': 'banana',
'broker_transport_options': {
'password': os.getenv('REDIS_BROKER_MASTER_PASSWORD'),
'master_name': os.getenv('REDIS_BROKER_MASTER_NAME')
}
}
```
If I deploy this way, I'm observing the following:
The webserver and scheduler don't show anything weird in their logging.
Their stdout looks fine, scheduler stderr is empty, and webserver stderr is
below. I don't think that is related.
```
/home/airflow/.local/lib/python3.10/site-packages/azure/storage/common/_connection.py:82
SyntaxWarning: "is" with a literal. Did you mean "=="?
[2022-12-14 14:09:29 +0000] [30] [INFO] Starting gunicorn 20.1.0
[2022-12-14 14:09:29 +0000] [30] [INFO] Listening at: http://0.0.0.0:8080
(30)
[2022-12-14 14:09:29 +0000] [30] [INFO] Using worker: sync
[2022-12-14 14:09:29 +0000] [46] [INFO] Booting worker with pid: 46
[2022-12-14 14:09:29 +0000] [47] [INFO] Booting worker with pid: 47
[2022-12-14 14:09:29 +0000] [48] [INFO] Booting worker with pid: 48
[2022-12-14 14:09:29 +0000] [49] [INFO] Booting worker with pid: 49
```
The worker, however, shows the following stdout:
```
-------------- celery@f616d2ff89b0 v5.2.7 (dawn-chorus)
--- ***** -----
-- ******* ---- Linux-5.18.0-0.deb11.4-amd64-x86_64-with-glibc2.31
2022-12-14 14:08:30
- *** --- * ---
- ** ---------- [config]
- ** ---------- .> app:
airflow.executors.celery_executor:0x7f29e6748ac0
- ** ---------- .> transport: amqp://guest:**@banaan:5672//
- ** ---------- .> results: mysql://xxx:**@xxx:3306/xxx
- *** --- * --- .> concurrency: 16 (prefork)
-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this
worker)
--- ***** -----
-------------- [queues]
.> default exchange=default(direct) key=default
```
And the following in stderr:
```
[2022-12-14 14:17:24,067: ERROR/MainProcess] consumer: Cannot connect to
amqp://guest:**@banaan:5672//: [Errno -2] Name or service not known.
Trying again in 32.00 seconds... (16/100)
[2022-12-14 14:17:56,098: ERROR/MainProcess] consumer: Cannot connect to
amqp://guest:**@banaan:5672//: [Errno -2] Name or service not known.
Trying again in 32.00 seconds... (16/100)
[2022-12-14 14:18:28,125: ERROR/MainProcess] consumer: Cannot connect to
amqp://guest:**@banaan:5672//: [Errno -2] Name or service not known.
Trying again in 32.00 seconds... (16/100)
[2022-12-14 14:19:00,160: ERROR/MainProcess] consumer: Cannot connect to
amqp://guest:**@banaan:5672//: [Errno -2] Name or service not known.
Trying again in 32.00 seconds... (16/100)
[2022-12-14 14:19:32,189: ERROR/MainProcess] consumer: Cannot connect to
amqp://guest:**@banaan:5672//: [Errno -2] Name or service not known.
Trying again in 32.00 seconds... (16/100)
[2022-12-14 14:20:04,217: ERROR/MainProcess] consumer: Cannot connect to
amqp://guest:**@banaan:5672//: [Errno -2] Name or service not known.
Trying again in 32.00 seconds... (16/100)
```
This suggests to me that _at least the worker_ is picking up the custom
config.
**Other observations.**
This makes me wonder, if I set the Redis config to something bogus, how come
the webserver and and scheduler don't complain?
In order to investigate this I set `AIRFLOW__WEBSERVER__EXPOSE_CONFIG=true`
(`AIRFLOW__LOGGING__LOGGING_LEVEL=DEBUG` has already been on since this
experiment).
Now I can observe the configuration in the Airflow webinterface. This page
has two sections. `/opt/airflow/airflow.cfg` shows the Airflow config file.
This is just the default file. We don't specify this file, so we're using the
one that comes the upstream Airflow container.
Under `Running Configuration` we can see the actual running configuration,
and here I see something interesting:
| Section | Key | Value | Source |
| --- | --- | --- | --- |
| celery | broker_url | redis://redis:6379/0 | airflow.cfg |
| celery | celery_config_options | retail_celery_config.CELERY_CONFIG | env
var |
It loads our reference custom celery config dict (as discussed earlier) from
the env var. However, it also loads the `broker_url` from the `airflow.cfg`
config file. Somehow, the worker looks like to use the one from our custom
config dict (since the logging clearly shows the test string there). The
webserver and scheduler, I think, are falling back to the default broker URL
from the `airflow.cfg` (or at least seem to ignore our custom dict). They don't
show any connection errors however (I've shared the logs above, the stdout logs
don't reference the test string anywhere, nor an indication there's something
wrong). According to [the
docs](https://airflow.apache.org/docs/apache-airflow/stable/howto/set-config.html),
I'd think that the environment variable file should get priority. I'm not sure
why (if `redis://redis:6379/0` does not exist) the webserver and scheduler seem
to work fine.
I've also searched in our log aggregator (the container UI is not the best
one for investigating logs older than a few minutes) for the test string, and
for the string `redis`. The first only shows log lines from the worker
container (the ones I shared above), the second one shows the following:
```
Date,Host,Service,Container Name,Message
"2022-12-14T13:49:07.140Z","""vmXXXX""","""airflow""","""airflow-init-5df407da-2388-dc6f-15be-78cec8708021""","[[34m2022-12-14
13:49:07,140[0m] {[34mproviders_manager.py:[0m433} DEBUG[0m - Loading
EntryPoint(name='provider_info',
value='airflow.providers.redis.get_provider_info:get_provider_info',
group='apache_airflow_provider') from package
apache-airflow-providers-redis[0m"
"2022-12-14T13:49:11.403Z","""vmXXXX""","""airflow""","""airflow-init-5df407da-2388-dc6f-15be-78cec8708021""","[[34m2022-12-14
13:49:11,402[0m] {[34mproviders_manager.py:[0m433} DEBUG[0m - Loading
EntryPoint(name='provider_info',
value='airflow.providers.redis.get_provider_info:get_provider_info',
group='apache_airflow_provider') from package
apache-airflow-providers-redis[0m"
"2022-12-14T13:49:37.692Z","""vmXXXX""","""airflow""","""airflow-webserver-5df407da-2388-dc6f-15be-78cec8708021""","[[34m2022-12-14
13:49:37,692[0m] {[34mproviders_manager.py:[0m433} DEBUG[0m - Loading
EntryPoint(name='provider_info',
value='airflow.providers.redis.get_provider_info:get_provider_info',
group='apache_airflow_provider') from package
apache-airflow-providers-redis[0m"
"2022-12-14T13:49:43.940Z","""vmXXXX""","""airflow""","""airflow-webserver-5df407da-2388-dc6f-15be-78cec8708021""","[[34m2022-12-14
13:49:43,940[0m] {[34mproviders_manager.py:[0m433} DEBUG[0m - Loading
EntryPoint(name='provider_info',
value='airflow.providers.redis.get_provider_info:get_provider_info',
group='apache_airflow_provider') from package
apache-airflow-providers-redis[0m"
"2022-12-14T13:49:47.090Z","""vmXXXX""","""airflow""","""airflow-webserver-5df407da-2388-dc6f-15be-78cec8708021""","[[34m2022-12-14
13:49:47,088[0m] {[34mproviders_manager.py:[0m433} DEBUG[0m - Loading
EntryPoint(name='provider_info',
value='airflow.providers.redis.get_provider_info:get_provider_info',
group='apache_airflow_provider') from package
apache-airflow-providers-redis[0m"
"2022-12-14T14:02:36.727Z","""vmXXXX""","""airflow""","""airflow-worker-8860332e-07cf-20dc-19b1-56c4ba462531""","File
""/home/airflow/.local/lib/python3.10/site-packages/redis/client.py"", line
1378, in ping"
"2022-12-14T14:02:36.727Z","""vmXXXX""","""airflow""","""airflow-worker-8860332e-07cf-20dc-19b1-56c4ba462531""","File
""/home/airflow/.local/lib/python3.10/site-packages/redis/client.py"", line
898, in execute_command"
"2022-12-14T14:02:36.727Z","""vmXXXX""","""airflow""","""airflow-worker-8860332e-07cf-20dc-19b1-56c4ba462531""","File
""/home/airflow/.local/lib/python3.10/site-packages/redis/connection.py"",
line 1192, in get_connection"
"2022-12-14T14:02:36.727Z","""vmXXXX""","""airflow""","""airflow-worker-8860332e-07cf-20dc-19b1-56c4ba462531""","File
""/home/airflow/.local/lib/python3.10/site-packages/redis/sentinel.py"", line
44, in connect"
"2022-12-14T14:02:36.727Z","""vmXXXX""","""airflow""","""airflow-worker-8860332e-07cf-20dc-19b1-56c4ba462531""","File
""/home/airflow/.local/lib/python3.10/site-packages/redis/sentinel.py"", line
106, in get_master_address"
"2022-12-14T14:02:36.727Z","""vmXXXX""","""airflow""","""airflow-worker-8860332e-07cf-20dc-19b1-56c4ba462531""","File
""/home/airflow/.local/lib/python3.10/site-packages/redis/sentinel.py"", line
219, in discover_master"
"2022-12-14T14:04:13.228Z","""vmXXXX""","""airflow""","""airflow-init-6686130f-cb74-c46b-2bff-a81c723030ea""","[[34m2022-12-14
14:04:13,227[0m] {[34mproviders_manager.py:[0m433} DEBUG[0m - Loading
EntryPoint(name='provider_info',
value='airflow.providers.redis.get_provider_info:get_provider_info',
group='apache_airflow_provider') from package
apache-airflow-providers-redis[0m"
"2022-12-14T14:04:17.405Z","""vmXXXX""","""airflow""","""airflow-init-6686130f-cb74-c46b-2bff-a81c723030ea""","[[34m2022-12-14
14:04:17,405[0m] {[34mproviders_manager.py:[0m433} DEBUG[0m - Loading
EntryPoint(name='provider_info',
value='airflow.providers.redis.get_provider_info:get_provider_info',
group='apache_airflow_provider') from package
apache-airflow-providers-redis[0m"
"2022-12-14T14:04:45.321Z","""vmXXXX""","""airflow""","""airflow-webserver-6686130f-cb74-c46b-2bff-a81c723030ea""","[[34m2022-12-14
14:04:45,320[0m] {[34mproviders_manager.py:[0m433} DEBUG[0m - Loading
EntryPoint(name='provider_info',
value='airflow.providers.redis.get_provider_info:get_provider_info',
group='apache_airflow_provider') from package
apache-airflow-providers-redis[0m"
"2022-12-14T14:04:52.018Z","""vmXXXX""","""airflow""","""airflow-webserver-6686130f-cb74-c46b-2bff-a81c723030ea""","[[34m2022-12-14
14:04:52,018[0m] {[34mproviders_manager.py:[0m433} DEBUG[0m - Loading
EntryPoint(name='provider_info',
value='airflow.providers.redis.get_provider_info:get_provider_info',
group='apache_airflow_provider') from package
apache-airflow-providers-redis[0m"
"2022-12-14T14:04:55.988Z","""vmXXXX""","""airflow""","""airflow-webserver-6686130f-cb74-c46b-2bff-a81c723030ea""","[[34m2022-12-14
14:04:55,987[0m] {[34mproviders_manager.py:[0m433} DEBUG[0m - Loading
EntryPoint(name='provider_info',
value='airflow.providers.redis.get_provider_info:get_provider_info',
group='apache_airflow_provider') from package
apache-airflow-providers-redis[0m"
"2022-12-14T14:08:44.904Z","""vmXXXX""","""airflow""","""airflow-init-b75ff09a-cfc0-dad0-0ece-8d3bdcda9553""","[[34m2022-12-14
14:08:44,904[0m] {[34mproviders_manager.py:[0m433} DEBUG[0m - Loading
EntryPoint(name='provider_info',
value='airflow.providers.redis.get_provider_info:get_provider_info',
group='apache_airflow_provider') from package
apache-airflow-providers-redis[0m"
"2022-12-14T14:08:49.385Z","""vmXXXX""","""airflow""","""airflow-init-b75ff09a-cfc0-dad0-0ece-8d3bdcda9553""","[[34m2022-12-14
14:08:49,384[0m] {[34mproviders_manager.py:[0m433} DEBUG[0m - Loading
EntryPoint(name='provider_info',
value='airflow.providers.redis.get_provider_info:get_provider_info',
group='apache_airflow_provider') from package
apache-airflow-providers-redis[0m"
"2022-12-14T14:09:18.278Z","""vmXXXX""","""airflow""","""airflow-webserver-b75ff09a-cfc0-dad0-0ece-8d3bdcda9553""","[[34m2022-12-14
14:09:18,278[0m] {[34mproviders_manager.py:[0m433} DEBUG[0m - Loading
EntryPoint(name='provider_info',
value='airflow.providers.redis.get_provider_info:get_provider_info',
group='apache_airflow_provider') from package
apache-airflow-providers-redis[0m"
"2022-12-14T14:09:24.433Z","""vmXXXX""","""airflow""","""airflow-webserver-b75ff09a-cfc0-dad0-0ece-8d3bdcda9553""","[[34m2022-12-14
14:09:24,433[0m] {[34mproviders_manager.py:[0m433} DEBUG[0m - Loading
EntryPoint(name='provider_info',
value='airflow.providers.redis.get_provider_info:get_provider_info',
group='apache_airflow_provider') from package
apache-airflow-providers-redis[0m"
"2022-12-14T14:09:29.312Z","""vmXXXX""","""airflow""","""airflow-webserver-b75ff09a-cfc0-dad0-0ece-8d3bdcda9553""","[[34m2022-12-14
14:09:29,311[0m] {[34mproviders_manager.py:[0m433} DEBUG[0m - Loading
EntryPoint(name='provider_info',
value='airflow.providers.redis.get_provider_info:get_provider_info',
group='apache_airflow_provider') from package
apache-airflow-providers-redis[0m"
```
Looking forward to your observations! Do let me know if there's any more
information I can provide. :)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]