Taragolis commented on PR #29616:
URL: https://github.com/apache/airflow/pull/29616#issuecomment-1437330747
Yeah as soon as just run a go to do my daily routine it finally failed 🥇 💯 😢
That is quite a bit interesting thing, some of them mostly could be a "wrong
assumptions"
Dag Runs
---
```console
HTTP: GET dags/example_bash_operator/dagRuns
{'dag_runs': [{'conf': {},
'dag_id': 'example_bash_operator',
'dag_run_id': 'test_dag_run_id',
'data_interval_end': '2023-02-20T00:00:00+00:00',
'data_interval_start': '2023-02-19T00:00:00+00:00',
'end_date': None,
'execution_date': '2023-02-20T10:30:00.702880+00:00',
'external_trigger': True,
'last_scheduling_decision': None,
'logical_date': '2023-02-20T10:30:00.702880+00:00',
'note': None,
'run_type': 'manual',
'start_date': None,
'state': 'queued'}],
'total_entries': 1}
```
`example_bash_operator` DAG has scheduling interval, as result we should see
here 2 DAG Runs, first for scheduled and second manual, in this case we could
see only one - manual which created during the test.
Scheduler Logs
---
```console
airflow-scheduler_1 |
airflow-scheduler_1 | BACKEND=redis
airflow-scheduler_1 | DB_HOST=redis
airflow-scheduler_1 | DB_PORT=6379
airflow-scheduler_1 |
airflow-scheduler_1 |
/home/airflow/.local/lib/python3.7/site-packages/airflow/models/base.py:49
MovedIn20Warning: Deprecated API features detected! These feature(s) are not
compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to
updating applications, ensure requirements files are pinned to
"sqlalchemy<2.0". Set environment variable SQLALCHEMY_WARN_20=1 to show all
deprecation warnings. Set environment variable
SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on
SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)
airflow-scheduler_1 | ____________ _____________
airflow-scheduler_1 | ____ |__( )_________ __/__ /________ __
airflow-scheduler_1 | ____ /| |_ /__ ___/_ /_ __ /_ __ \_ | /| / /
airflow-scheduler_1 | ___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ /
airflow-scheduler_1 | _/_/ |_/_/ /_/ /_/ /_/ \____/____/|__/
airflow-scheduler_1 | [2023-02-20T10:29:14.618+0000]
{executor_loader.py:114} INFO - Loaded executor: CeleryExecutor
airflow-scheduler_1 | [2023-02-20T10:29:14.664+0000]
{scheduler_job.py:724} INFO - Starting the scheduler
airflow-scheduler_1 | [2023-02-20T10:29:14.665+0000]
{scheduler_job.py:731} INFO - Processing each file at most -1 times
airflow-scheduler_1 | [2023-02-20T10:29:14.669+0000] {manager.py:164}
INFO - Launched DagFileProcessorManager with pid: 33
airflow-scheduler_1 | [2023-02-20T10:29:14.671+0000]
{scheduler_job.py:1437} INFO - Resetting orphaned tasks for active dag runs
airflow-scheduler_1 | [2023-02-20T10:29:14.685+0000] {settings.py:61}
INFO - Configured default timezone Timezone('UTC')
```
Thats all, seems like it scheduler is just hang but service reported that it
healthy. Is it problem with recent changes in health check
https://github.com/apache/airflow/pull/29408 and maybe problem with simple http
server in scheduler.
I would add output from `/health` endpoint in case of failure
Docker services after test failure
---
```console
$ docker ps
CONTAINER ID IMAGE
COMMAND CREATED STATUS
PORTS NAMES
8da8ebd97f17
ghcr.io/apache/airflow/main/prod/python3.7:a8723aa63be724652809c141714af95493aea68c
"/usr/bin/dumb-init …" 2 minutes ago Up 2 minutes (healthy) 8080/tcp
quick-start_airflow-triggerer_1
88a829428ce8
ghcr.io/apache/airflow/main/prod/python3.7:a8723aa63be724652809c141714af95493aea68c
"/usr/bin/dumb-init …" 2 minutes ago Up 2 minutes (healthy)
0.0.0.0:8080->8080/tcp, :::8080->8080/tcp quick-start_airflow-webserver_1
f3baa9496225
ghcr.io/apache/airflow/main/prod/python3.7:a8723aa63be724652809c141714af95493aea68c
"/usr/bin/dumb-init …" 2 minutes ago Up 2 minutes (healthy) 8080/tcp
quick-start_airflow-scheduler_1
134b3356ed96
ghcr.io/apache/airflow/main/prod/python3.7:a8723aa63be724652809c141714af95493aea68c
"/usr/bin/dumb-init …" 2 minutes ago Up 2 minutes (healthy) 8080/tcp
quick-start_airflow-worker_1
a5f5e8250820 redis:latest
"docker-entrypoint.s…" 3 minutes ago Up 3 minutes
(healthy) 6379/tcp quick-start_redis_1
de963f245166 postgres:13
"docker-entrypoint.s…" 3 minutes ago Up 3 minutes
(healthy) 5432/tcp quick-start_postgres_1
```
All healthy, that mean initially services pass health check after start time
Versions
---
```console
$ docker version
Client:
Version: 20.10.23+azure-2
API version: 1.41
Go version: go1.19.6
Git commit: 715524332ff91d0f9ec5ab2ec95f051456ed1dba
Built: Wed Jan 18 20:42:16 UTC 2023
OS/Arch: linux/amd64
Context: default
Experimental: true
Server:
Engine:
Version: 20.10.22+azure-1
API version: 1.41 (minimum version 1.12)
Go version: go1.18.9
Git commit: 42c8b314993e5eb3cc2776da0bbe41d5eb4b707b
Built: Thu Dec 15 22:17:04 2022
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.6.18+azure-1
GitCommit: 2456e983eb9e37e47538f59ea18f2043c9a73640
runc:
Version: 1.1.4
GitCommit: 5fd4c4d144137e991c4acebb2146ab1483a97925
docker-init:
Version: 0.19.0
GitCommit:
```
```console
$ docker-compose version
docker-compose version 1.29.2, build 5becea4c
docker-py version: 5.0.0
CPython version: 3.7.10
OpenSSL version: OpenSSL 1.1.0l 10 Sep 2019
```
That is more interesting. I've seen before that statics checks sometimes
failed with particular this version of docker `20.10.23+azure-2` and didn't
seen that this happen in docker without `azure-X`.
Another strange things
---
`Prepare Breeze and PROD image` step have a lot of errors witch refers to
permission denied
```console
Received 27910740 of 32105044 (86.9%), 26.6 MBs/sec
Received 32105044 of 32105044 (100.0%), 29.8 MBs/sec
Cache Size: ~31 MB (32105044 B)
/usr/bin/tar -xf
/home/runner/work/_temp/00fdf96d-139b-4[95](https://github.com/apache/airflow/actions/runs/4222264319/jobs/7330883288#step:4:100)4-ad8c-852b0f051104/cache.tgz
-P -C /home/runner/work/airflow/airflow -z
/usr/bin/tar: ../../../../.local: Cannot mkdir: Permission denied
/usr/bin/tar: ../../../../.local/pipx: Cannot mkdir: No such file or
directory
/usr/bin/tar: ../../../../.local: Cannot mkdir: Permission denied
/usr/bin/tar: ../../../../.local/pipx/shared: Cannot mkdir: No such file or
directory
/usr/bin/tar: ../../../../.local: Cannot mkdir: Permission denied
/usr/bin/tar: ../../../../.local/pipx/shared/lib: Cannot mkdir: No such file
or directory
/usr/bin/tar: ../../../../.local: Cannot mkdir: Permission denied
/usr/bin/tar: ../../../../.local/pipx/shared/lib/python3.7: Cannot mkdir: No
such file or directory
/usr/bin/tar: ../../../../.local: Cannot mkdir: Permission denied
/usr/bin/tar: ../../../../.local/pipx/shared/lib/python3.7/site-packages:
Cannot mkdir: No such file or directory
/usr/bin/tar: ../../../../.local: Cannot mkdir: Permission denied
/usr/bin/tar:
../../../../.local/pipx/shared/lib/python3.7/site-packages/_distutils_hack:
Cannot mkdir: No such file or directory
/usr/bin/tar: ../../../../.local: Cannot mkdir: Permission denied
/usr/bin/tar:
../../../../.local/pipx/shared/lib/python3.7/site-packages/_distutils_hack/override.py:
Cannot open: No such file or directory
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]