Charlie created AIRFLOW-6194:
--------------------------------

             Summary: Task instances aren't running after meeting dependencies
                 Key: AIRFLOW-6194
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-6194
             Project: Apache Airflow
          Issue Type: Bug
          Components: DagRun, executors, scheduler, worker
    Affects Versions: 1.10.6
            Reporter: Charlie


We recently had an issue arise with our Airflow instance which caused the 
scheduler to enter some sort of a deadlocked state in the middle of operation. 
In this state, all DAG runs were listed as 'scheduled' and it didn't appear as 
if anything at all was happening.

Initially, I thought this might be an issue with our configuration, but I 
couldn't quite track down why this issue wouldn't have arisen earlier and, 
looking at the logs, I've been seeing some strange behavior that I can't quite 
explain.

The most notable thing is that, for whatever reason, the Executor Class listed 
under all of our jobs is 'NoneType', previously 'LocalExecutor'. Looking at our 
logs, this change initially happened when we updated our instance two days 
prior to this initial deadlock, however, I have since cleared the database 
altogether and find that even starting from scratch, 'NoneType' is appearing.

In these same logs, I can see jobs continuously running for this DAG run, 
however the start and end times are less than a second apart. At the same time, 
all task instances are either listed a 'success' or 'scheduled' so I'm not 
entirely sure what the running jobs are. 

If I look in the Task Instance Details for any of these scheduled tasks, I see 
{code:java}
All dependencies are met but the task instance is not running. In most cases 
this just means that the task will probably be scheduled soon unless:
- The scheduler is down or under heavy load

If this task instance does not start soon please contact your Airflow 
administrator for assistance.{code}
Upon viewing the logs in the airflow for the scheduler, nothing seem awry.

So to summarize, the scheduler seems to be doing it's job, as DAG runs are 
properly scheduled and set as 'running' however the instances themselves are 
not completing properly. Due to the listing of 'NoneType' instead of 
'LocalExecutor' for the jobs, my theory is that there is some issue with the 
LocalExecutor, that's causing it not properly execute jobs. Again, clearing the 
database didn't seem to help this, and I now run into this deadlock almost 
immediately with a test DAG I'm running.

If I can provide any additional information, please let me know. I'd love to 
get this resolved or figured out, as we're currently unable to use Airflow 
because of this.

Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to