Awesome.. thanks Bolke!

On Wed, Jun 14, 2017 at 6:10 PM Bolke de Bruin <[email protected]> wrote:

> I have created PR https://github.com/apache/incubator-airflow/pull/2365 <
> https://github.com/apache/incubator-airflow/pull/2365> for this issue.
>
> Bolke
>
> > On 14 Jun 2017, at 16:26, Bolke de Bruin <[email protected]> wrote:
> >
> > Sorry missed your comment on the dag. Will have a look.
> >
> >> On 14 Jun 2017, at 13:42, Daniel Huang <[email protected]> wrote:
> >>
> >> I think this is the same issue I've been hitting with
> ShortCircuitOperator
> >> and LatestOnlyOperator. I filed
> >> https://issues.apache.org/jira/browse/AIRFLOW-1296 a few days ago. It
> >> includes a DAG I can consistently reproduce this with on 1.8.1 and
> master.
> >> I get the "This should not happen" log message as well and the DAG
> fails.
> >>
> >> On Wed, Jun 14, 2017 at 3:27 AM, Bolke de Bruin <[email protected]>
> wrote:
> >>
> >>> Please provide the full logs (you are cutting out too much info), dag
> >>> definition (sanitized), airflow version.
> >>>
> >>> Bolke
> >>>
> >>> Sent from my iPhone
> >>>
> >>>> On 13 Jun 2017, at 23:51, Rajesh Chamarthi <
> [email protected]>
> >>> wrote:
> >>>>
> >>>> I currently have a dag which follows the following pattern
> >>>>
> >>>> short_circuit_operator -> s3_sensor -> downstream_task_1 ->
> >>>> Downstream_task_2
> >>>>
> >>>> When short circuit evaluates to false, s3_sensor is skipped, other
> >>>> downstream task states remains at None and DAG Run fails.
> >>>>
> >>>> couple of questions :
> >>>>
> >>>> 1) Which part/component of the application (scheduler/operator/?)
> takes
> >>>> care of cascading the skipped status to downstream jobs? Short Circuit
> >>>> operator only seems to update the immediate downstream jobs
> >>>>
> >>>> 2) Using CeleryExecutor seems to cause this. Are there any other logs
> or
> >>>> processes I can run to figure out the root of the problem?
> >>>>
> >>>> More details below
> >>>>
> >>>> * ShortCircuitOperator Log: (The first downstream task is set to
> skipped,
> >>>> although log shows a warning)
> >>>>
> >>>> ```
> >>>> [2017-06-12 09:00:24,552] {base_task_runner.py:95} INFO - Subtask:
> >>>> [2017-06-12 09:00:24,552] {python_operator.py:177} INFO - Skipping
> task:
> >>>> on_s3_xyz
> >>>> [2017-06-12 09:00:24,553] {base_task_runner.py:95} INFO - Subtask:
> >>>> [2017-06-12 09:00:24,553] {python_operator.py:188} WARNING - Task
> >>>> <Task(S3KeySensor): on_s3_xyz> was not part of a dag run. This should
> not
> >>>> happen.
> >>>> ```
> >>>>
> >>>> * Scheduler log (marks the Dag Run as failed)
> >>>>
> >>>> [2017-06-13 17:57:20,983] {models.py:4184} DagFileProcessor43 INFO -
> >>>> Deadlock; marking run <DagRun test_inbound @ 2017-06-05 09:00:00:
> >>>> scheduled__2017-06-05T09:00:00, externally triggered: False> failed
> >>>>
> >>>> When I check the dag run and run through the code, it looks like
> trigger
> >>>> rule evaluates to false because upstream is "skipped"
> >>>>
> >>>> ```
> >>>> Previous Dagrun State True The task did not have depends_on_past set.
> >>>> Not In Retry Period True The task instance was not marked for
> retrying.
> >>>> Trigger Rule False Task's trigger rule 'all_success' requires all
> >>> upstream
> >>>> tasks to have succeeded, but found 1 non-success(es).
> >>>> upstream_tasks_state={'failed': 0, 'successes': 0, 'skipped': 1,
> >>> 'done': 1,
> >>>> 'upstream_failed': 0}, upstream_task_ids=['on_s3_xyz']
> >>>> ```
> >>>
> >
>
>

Reply via email to