Yup. I have added retries to that tasks (depending on short running or long running jobs) I have a DAG level dagrun_timeout of 1 hour. But for some reason this doesnt work (If some task fails all its retires, the dag remains in running state forever and then no further jobs are scheduled after number of active dags reach the parallelism-param set in airflow.cfg). I did start a separate thread for this.
The goal is to have some automate pipeline monitor is run behind the main pipeline (may be once a day) and '*clear*' the state of failed task (OR ideally put the jobs in a state where the scheduler picks it up) so that the task can put them back to running. This would mean one does not have to manually re-run as failed task (once the bug gets fixed) I am doing this by using scripts that uses airflow cli: "airflow task_state"/ 'airflow clear'. Thanks On Mon, Jun 27, 2016 at 8:25 AM, Lance Norskog <[email protected]> wrote: > You can add add retries to the task, including a timeout and a counter. So, > 5 retries with an hour in between might be a strategy. > > > On Sat, Jun 25, 2016 at 7:24 PM, harish singh <[email protected]> > wrote: > > > Hi guys, > > > > I am trying to build a pipeline/script to monitor our Data-processing > > pipeline :) > > > > Basically, I am trying to do these things: > > 1. Go back in time n hours. and Get status of a TASK for last n hours > > (assuming hourly jobs) > > I can use the airflow CLI command: "*task_state" * to achieve this. > > > > So this tells me where the job has failed/succeeded/running etc. > > > > > > 2. Once I figure out, if some execution of a TASK has state "failed", I > > want to change the state to "running" again. so that scheduler picks it > up > > and runs it?? > > *Is there a way to do this? * > > > > I think one way to do this is: > > if a Task is in failed state ---> user "airflow clear" and CLEAR the > > state. so that scheduler picks it up. > > But I am not sure how much I can depend on this approach? Will this > always > > work? > > > > > > I just want to think out loud and know if there is a better way to doing > > this that I am not looking at? Either through code? a new monitoring > > pipeline? > > > > > > Thanks, > > Harish > > > > > > -- > Lance Norskog > [email protected] > Redwood City, CA >
