nookcreed opened a new issue, #36920:
URL: https://github.com/apache/airflow/issues/36920
### Apache Airflow version
Other Airflow 2 version (please specify below)
### If "Other Airflow 2 version" selected, which one?
2.7.3
### What happened?
We are encountering an issue in our Apache Airflow setup where, after a few
successful DagRuns, the scheduler stops scheduling new runs. The scheduler logs
indicate:
`{scheduler_job_runner.py:1426} INFO - DAG dag-test scheduling was skipped,
probably because the DAG record was locked.`
This problem persists despite running a single scheduler pod. Notably,
reverting the changes from [PR
#31414](https://github.com/apache/airflow/pull/31414) resolves this issue. A
similar issue has been discussed on Stack Overflow: [Airflow Kubernetes
Executor Scheduling Skipped Because Dag Record Was
Locked](https://stackoverflow.com/questions/77405009/airflow-kubernetes-executor-scheduling-skipped-because-dag-record-was-locked).
### What you think should happen instead?
The scheduler should consistently schedule new DagRuns as per DAG
configurations, without interruption due to DAG record locks.
### How to reproduce
Run airflow v.2.7.3 on kubernetes. HA is not required.
Trigger multiple DagRuns (We have about 10 DAGs that run every minute).
Observe scheduler behavior and logs after a few successful runs. The error
shows up after a few minutes
### Operating System
centos7
### Versions of Apache Airflow Providers
apache-airflow-providers-amazon==8.10.0
apache-airflow-providers-apache-hive==6.2.0
apache-airflow-providers-apache-livy==3.6.0
apache-airflow-providers-cncf-kubernetes==7.8.0
apache-airflow-providers-common-sql==1.8.0
apache-airflow-providers-ftp==3.6.0
apache-airflow-providers-google==10.11.0
apache-airflow-providers-http==4.6.0
apache-airflow-providers-imap==3.4.0
apache-airflow-providers-papermill==3.4.0
apache-airflow-providers-postgres==5.7.1
apache-airflow-providers-presto==5.2.1
apache-airflow-providers-salesforce==5.5.0
apache-airflow-providers-snowflake==5.1.0
apache-airflow-providers-sqlite==3.5.0
apache-airflow-providers-trino==5.4.0
### Deployment
Other
### Deployment details
We have wrappers around the official airflow helm chart and docker images.
Environment:
Airflow Version: 2.7.3
Kubernetes Version: 1.24
Executor: KubernetesExecutor
Database: PostgreSQL (metadata database)
Environment/Infrastructure: Kubernetes cluster running Airflow in Docker
containers
### Anything else?
Actual Behavior:
The scheduler stops scheduling new runs after a few DagRuns, with log
messages about the DAG record being locked.
Workaround:
Restarting the scheduler pod releases the lock and allows normal scheduling
to resume, but this is not viable in production. Reverting the changes in [PR
#31414](https://github.com/apache/airflow/pull/31414) also resolves the issue.
Questions/Request for Information:
1. Under what scenarios is the lock on a DAG record typically not released?
2. Are there known issues in Airflow 2.7.3, or specific configurations, that
might cause the DAG record to remain locked, thereby preventing new run
scheduling?
3. Could the changes made in [PR
#31414](https://github.com/apache/airflow/pull/31414) be related to this issue?
### Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]