[
https://issues.apache.org/jira/browse/AIRFLOW-5881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17000910#comment-17000910
]
Nidhi commented on AIRFLOW-5881:
--------------------------------
[~ash] Sure! Here is the DAG file which I am using inside this dag file when
you trigger it there will be around 60000 tasks.
{code:java}
import os
from datetime import datetime , timedelta
import airflow
import psycopg2
from airflow import DAG
from airflow.models import Variable
from airflow.operators.bash_operator import BashOperator
from airflow.operators.dagrun_operator import TriggerDagRunOperator
import psycopg2
default_args = {
'owner': 'Airflow' ,
'depends_on_past': False ,
'start_date': airflow.utils.dates.days_ago(2) ,
'retries': 1 ,
'retry_delay': timedelta(minutes=1) ,
}
dag = DAG(
'deidentification_module_dag' ,
default_args=default_args ,
schedule_interval=None)
# %%
inDir = Variable.get('indir_deid_module')
outDir = Variable.get('indir_numpy_module')
# %%
for studies in os.listdir(inDir):
tasks = BashOperator(
task_id='{}'.format(studies),
params={'inputStudyFile': os.path.join(inDir, studies), 'outputStudyDir':
outDir},
bash_command="""python /home/ntrivedi2/airflow/dags/deidentification.py
-K '{{dag_run.conf['key']}}' -i '{{dag_run.conf['iv']}}' -s '{{
params.inputStudyFile }}' -o '{{params.outputStudyDir}}'""",
dag=dag,
)
{code}
> Dag gets stuck in "Scheduled" State when scheduling a large number of tasks
> ---------------------------------------------------------------------------
>
> Key: AIRFLOW-5881
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5881
> Project: Apache Airflow
> Issue Type: Bug
> Components: scheduler
> Affects Versions: 1.10.6
> Reporter: David Hartig
> Priority: Critical
> Attachments: 2 (1).log, airflow.cnf
>
>
> Running with the KubernetesExecutor in and AKS cluster, when we upgraded to
> version 1.10.6 we noticed that the all the Dags stop making progress but
> start running and immediate exiting with the following message:
> "Instance State' FAILED: Task is in the 'scheduled' state which is not a
> valid state for execution. The task must be cleared in order to be run."
> See attached log file for the worker. Nothing seems out of the ordinary in
> the Scheduler log.
> Reverting to 1.10.5 clears the problem.
> Note that at the time of the failure maybe 100 or so tasks are in this state,
> with 70 coming from one highly parallelized dag. Clearing the scheduled tasks
> just makes them reappear shortly thereafter. Marking them "up_for_retry"
> results in one being executed but then the system is stuck in the original
> zombie state.
> Attached is the also a redacted airflow config flag.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)