[ 
https://issues.apache.org/jira/browse/AIRFLOW-5881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17000910#comment-17000910
 ] 

Nidhi commented on AIRFLOW-5881:
--------------------------------

[~ash] Sure! Here is the DAG file which I am using  inside this dag file when 
you trigger it there will be around 60000 tasks.

 

 
{code:java}
import os
from datetime import datetime , timedelta
import airflow
import psycopg2
from airflow import DAG
from airflow.models import Variable
from airflow.operators.bash_operator import BashOperator
from airflow.operators.dagrun_operator import TriggerDagRunOperator
import psycopg2
default_args = {
 'owner': 'Airflow' ,
 'depends_on_past': False ,
 'start_date': airflow.utils.dates.days_ago(2) ,
 'retries': 1 ,
 'retry_delay': timedelta(minutes=1) ,
}
dag = DAG(
 'deidentification_module_dag' ,
   default_args=default_args ,
    schedule_interval=None)
# %%
inDir = Variable.get('indir_deid_module')
outDir = Variable.get('indir_numpy_module')
# %%
for studies in os.listdir(inDir):
 tasks = BashOperator(
      task_id='{}'.format(studies),
      params={'inputStudyFile': os.path.join(inDir, studies), 'outputStudyDir': 
outDir},
      bash_command="""python /home/ntrivedi2/airflow/dags/deidentification.py 
-K '{{dag_run.conf['key']}}' -i '{{dag_run.conf['iv']}}' -s       '{{ 
params.inputStudyFile }}' -o '{{params.outputStudyDir}}'""",
      dag=dag,
      )
{code}
 

> Dag gets stuck in "Scheduled" State when scheduling a large number of tasks
> ---------------------------------------------------------------------------
>
>                 Key: AIRFLOW-5881
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-5881
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: scheduler
>    Affects Versions: 1.10.6
>            Reporter: David Hartig
>            Priority: Critical
>         Attachments: 2 (1).log, airflow.cnf
>
>
> Running with the KubernetesExecutor in and AKS cluster, when we upgraded to 
> version 1.10.6 we noticed that the all the Dags stop making progress but 
> start running and immediate exiting with the following message:
> "Instance State' FAILED: Task is in the 'scheduled' state which is not a 
> valid state for execution. The task must be cleared in order to be run."
> See attached log file for the worker. Nothing seems out of the ordinary in 
> the Scheduler log. 
> Reverting to 1.10.5 clears the problem.
> Note that at the time of the failure maybe 100 or so tasks are in this state, 
> with 70 coming from one highly parallelized dag. Clearing the scheduled tasks 
> just makes them reappear shortly thereafter. Marking them "up_for_retry" 
> results in one being executed but then the system is stuck in the original 
> zombie state. 
> Attached is the also a redacted airflow config flag. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to