[
https://issues.apache.org/jira/browse/AIRFLOW-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16965398#comment-16965398
]
Matt Blaha edited comment on AIRFLOW-2319 at 11/2/19 3:22 PM:
--------------------------------------------------------------
This is a big issue for me, as [~TrevorEdwards] mentioned above, I have a
series of tasks in a single DAG that gets kicked off repeatedly with different
parameters via external trigger. All of the external sources have to ensure
execution time is at least one second apart, but in my case, they don't
communicate with each other at all. Nothing I've read implies that there's
something by design in Airflow that means runs shouldn't be triggered this way.
I manually removed the constraint and several thousand runs complete just fine.
If this is by design as [~ash] suggested above, could someone please elaborate
on the why and suggest an alternative way to prevent a high volume of DAG runs
with different parameters from failing? If they should fail, I agree with
above, what would the appropriate error message be?
was (Author: mattblaha):
This is a big issue for me, as [~TrevorEdwards] mentioned above, a series of
tasks in a single DAG that gets kicked off repeatedly with different parameters
via external trigger. All of the external sources have to ensure execution time
is at least one second apart, but in my case, they don't communicate with each
other at all. Nothing I've read implies that this should be necessary when
externally triggering jobs.
I manually removed the constraint and several thousand runs complete just fine.
If this is by design as [~ash] suggested above, could someone please elaborate
on the why and suggest an alternative way to prevent a high volume of DAG runs
with different parameters from failing? If they should fail, I agree with
above, what would the appropriate error message be?
> Table "dag_run" has (bad) second index on (dag_id, execution_date)
> ------------------------------------------------------------------
>
> Key: AIRFLOW-2319
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2319
> Project: Apache Airflow
> Issue Type: Bug
> Components: DagRun
> Affects Versions: 1.9.0
> Reporter: Andreas Költringer
> Priority: Major
>
> Inserting DagRun's via {{airflow.api.common.experimental.trigger_dag}}
> (multiple rows with the same {{(dag_id, execution_date)}}) raised the
> following error:
> {code:java}
> {models.py:1644} ERROR - No row was found for one(){code}
> This is weird as the {{session.add()}} and {{session.commit()}} is right
> before {{run.refresh_from_db()}} in {{models.DAG.create_dagrun()}}.
> Manually inspecting the database revealed that there is an extra index with
> {{unique}} constraint on the columns {{(dag_id, execution_date)}}:
> {code:java}
> sqlite> .schema dag_run
> CREATE TABLE dag_run (
> id INTEGER NOT NULL,
> dag_id VARCHAR(250),
> execution_date DATETIME,
> state VARCHAR(50),
> run_id VARCHAR(250),
> external_trigger BOOLEAN, conf BLOB, end_date DATETIME, start_date
> DATETIME,
> PRIMARY KEY (id),
> UNIQUE (dag_id, execution_date),
> UNIQUE (dag_id, run_id),
> CHECK (external_trigger IN (0, 1))
> );
> CREATE INDEX dag_id_state ON dag_run (dag_id, state);{code}
> (On SQLite its a unique constraint, on MariaDB its also an index)
> The {{DagRun}} class in {{models.py}} does not reflect this, however it is in
> [migrations/versions/1b38cef5b76e_add_dagrun.py|https://github.com/apache/incubator-airflow/blob/master/airflow/migrations/versions/1b38cef5b76e_add_dagrun.py#L42]
> I looked for other migrations correting this, but could not find any. As this
> is not reflected in the model, I guess this is a bug?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)