Is there any error in the logs of the task instance (on the worker) towards
the end? Perhaps the airflow process is unable to communicate with the
database in some cases? Maybe network or database being unresponsive? If
that is the case there should be a stack trace in the log.

In any case, the task should stop emitting heartbeats, and the scheduler
should eventually mark it as failed. If you have retries set up, the
scheduler should then proceed to starting a retry.

Max

On Wed, Jan 11, 2017 at 4:09 PM, Ken Edwards <[email protected]> wrote:

> I've successfully been using Airflow the past few months to run relatively
> simple dags on a regular cadence. However, I've run into stability issues
> with more complex workflows that consist of parent/child dags, some of
> which may contain hundreds of tasks. The most common symptom is that state
> transitions do not always happen even though the previous task succeeds,
> requiring human monitoring/prodding when it gets stuck.
>
> I would appreciate any advice on the things to look at to debug this or
> common causes for this.
>
> Thank you,
> Ken
>

Reply via email to