By the time "INFO - Task exited with return code 0" gets logged, the task should have been marked as successful by the subprocess. I have no specific intuition as to what the issue may be.
I'm guessing at that point the job stops emitting heartbeat and eventually the scheduler will handle it as a failure? How often does that happen? Max On Fri, Jul 28, 2017 at 9:43 AM, Marc Weil <[email protected]> wrote: > From what I can tell, it only affects CeleryExecutor. I've never seen this > behavior with LocalExecutor before. > > Max, do you know anything about this type of failure mode? > ᐧ > > -- > Marc Weil | Lead Engineer | Growth Automation, Marketing, and Engagement | > New Relic > > On Fri, Jul 28, 2017 at 5:48 AM, Jonas Karlsson <[email protected]> > wrote: > > > We have the exact same problem. In our case, it's a bash operator > starting > > a docker container. The container and process it ran exit, but the > 'docker > > run' command is still showing up in the process table, waiting for an > > event. > > I'm trying to switch to LocalExecutor to see if that will help. > > > > _jonas > > > > > > On Thu, Jul 27, 2017 at 4:28 PM Marc Weil <[email protected]> wrote: > > > > > Hello, > > > > > > Has anyone seen the behavior when using CeleryExecutor where workers > will > > > finish their tasks ("INFO - Task exited with return code 0" shows in > the > > > logs) but are never marked as complete in the airflow DB or UI? > > Effectively > > > this causes tasks to hang even though they are complete, and the DAG > will > > > not continue. > > > > > > This is happening on 1.8.0. Anyone else seen this or perhaps have a > > > workaround? > > > > > > Thanks! > > > > > > -- > > > Marc Weil | Lead Engineer | Growth Automation, Marketing, and > Engagement > > | > > > New Relic > > > ᐧ > > > > > >
