Awesome.. thanks Bolke! On Wed, Jun 14, 2017 at 6:10 PM Bolke de Bruin <[email protected]> wrote:
> I have created PR https://github.com/apache/incubator-airflow/pull/2365 < > https://github.com/apache/incubator-airflow/pull/2365> for this issue. > > Bolke > > > On 14 Jun 2017, at 16:26, Bolke de Bruin <[email protected]> wrote: > > > > Sorry missed your comment on the dag. Will have a look. > > > >> On 14 Jun 2017, at 13:42, Daniel Huang <[email protected]> wrote: > >> > >> I think this is the same issue I've been hitting with > ShortCircuitOperator > >> and LatestOnlyOperator. I filed > >> https://issues.apache.org/jira/browse/AIRFLOW-1296 a few days ago. It > >> includes a DAG I can consistently reproduce this with on 1.8.1 and > master. > >> I get the "This should not happen" log message as well and the DAG > fails. > >> > >> On Wed, Jun 14, 2017 at 3:27 AM, Bolke de Bruin <[email protected]> > wrote: > >> > >>> Please provide the full logs (you are cutting out too much info), dag > >>> definition (sanitized), airflow version. > >>> > >>> Bolke > >>> > >>> Sent from my iPhone > >>> > >>>> On 13 Jun 2017, at 23:51, Rajesh Chamarthi < > [email protected]> > >>> wrote: > >>>> > >>>> I currently have a dag which follows the following pattern > >>>> > >>>> short_circuit_operator -> s3_sensor -> downstream_task_1 -> > >>>> Downstream_task_2 > >>>> > >>>> When short circuit evaluates to false, s3_sensor is skipped, other > >>>> downstream task states remains at None and DAG Run fails. > >>>> > >>>> couple of questions : > >>>> > >>>> 1) Which part/component of the application (scheduler/operator/?) > takes > >>>> care of cascading the skipped status to downstream jobs? Short Circuit > >>>> operator only seems to update the immediate downstream jobs > >>>> > >>>> 2) Using CeleryExecutor seems to cause this. Are there any other logs > or > >>>> processes I can run to figure out the root of the problem? > >>>> > >>>> More details below > >>>> > >>>> * ShortCircuitOperator Log: (The first downstream task is set to > skipped, > >>>> although log shows a warning) > >>>> > >>>> ``` > >>>> [2017-06-12 09:00:24,552] {base_task_runner.py:95} INFO - Subtask: > >>>> [2017-06-12 09:00:24,552] {python_operator.py:177} INFO - Skipping > task: > >>>> on_s3_xyz > >>>> [2017-06-12 09:00:24,553] {base_task_runner.py:95} INFO - Subtask: > >>>> [2017-06-12 09:00:24,553] {python_operator.py:188} WARNING - Task > >>>> <Task(S3KeySensor): on_s3_xyz> was not part of a dag run. This should > not > >>>> happen. > >>>> ``` > >>>> > >>>> * Scheduler log (marks the Dag Run as failed) > >>>> > >>>> [2017-06-13 17:57:20,983] {models.py:4184} DagFileProcessor43 INFO - > >>>> Deadlock; marking run <DagRun test_inbound @ 2017-06-05 09:00:00: > >>>> scheduled__2017-06-05T09:00:00, externally triggered: False> failed > >>>> > >>>> When I check the dag run and run through the code, it looks like > trigger > >>>> rule evaluates to false because upstream is "skipped" > >>>> > >>>> ``` > >>>> Previous Dagrun State True The task did not have depends_on_past set. > >>>> Not In Retry Period True The task instance was not marked for > retrying. > >>>> Trigger Rule False Task's trigger rule 'all_success' requires all > >>> upstream > >>>> tasks to have succeeded, but found 1 non-success(es). > >>>> upstream_tasks_state={'failed': 0, 'successes': 0, 'skipped': 1, > >>> 'done': 1, > >>>> 'upstream_failed': 0}, upstream_task_ids=['on_s3_xyz'] > >>>> ``` > >>> > > > >
