By the time "INFO - Task exited with return code 0" gets logged, the task
should have been marked as successful by the subprocess. I have no specific
intuition as to what the issue may be.

I'm guessing at that point the job stops emitting heartbeat and eventually
the scheduler will handle it as a failure?

How often does that happen?

Max

On Fri, Jul 28, 2017 at 9:43 AM, Marc Weil <[email protected]> wrote:

> From what I can tell, it only affects CeleryExecutor. I've never seen this
> behavior with LocalExecutor before.
>
> Max, do you know anything about this type of failure mode?
> ᐧ
>
> --
> Marc Weil | Lead Engineer | Growth Automation, Marketing, and Engagement |
> New Relic
>
> On Fri, Jul 28, 2017 at 5:48 AM, Jonas Karlsson <[email protected]>
> wrote:
>
> > We have the exact same problem. In our case, it's a bash operator
> starting
> > a docker container. The container and process it ran exit, but the
> 'docker
> > run' command is still showing up in the process table, waiting for an
> > event.
> > I'm trying to switch to LocalExecutor to see if that will help.
> >
> > _jonas
> >
> >
> > On Thu, Jul 27, 2017 at 4:28 PM Marc Weil <[email protected]> wrote:
> >
> > > Hello,
> > >
> > > Has anyone seen the behavior when using CeleryExecutor where workers
> will
> > > finish their tasks ("INFO - Task exited with return code 0" shows in
> the
> > > logs) but are never marked as complete in the airflow DB or UI?
> > Effectively
> > > this causes tasks to hang even though they are complete, and the DAG
> will
> > > not continue.
> > >
> > > This is happening on 1.8.0. Anyone else seen this or perhaps have a
> > > workaround?
> > >
> > > Thanks!
> > >
> > > --
> > > Marc Weil | Lead Engineer | Growth Automation, Marketing, and
> Engagement
> > |
> > > New Relic
> > > ᐧ
> > >
> >
>

Reply via email to