The GitHub Actions job "Tests" on airflow.git has failed.
Run started by GitHub user ashb (triggered by ashb).

Head commit for run:
0301b7d21bfb87ba26cced35ee7ecd18d48cbeac / Ash Berlin-Taylor <[email protected]>
Don't error when multiple tasks produce the same dataset

Previously this was "racey", so running multiple dags all updating the
same outlet dataset (or how I ran in to this: mapped tasks) would cause
some of them to fail with a unique constraing violation.

The fix has two paths, one generic and an optimized version for
Postgres.

The generic one is likely slightly slower, and uses the pattern that the
SQLA docs have for exactly this case. To quote

> This pattern is ideal for situations such as using PostgreSQL and
> catching IntegrityError to detect duplicate rows; PostgreSQL normally
> aborts the entire tranasction when such an error is raised, however when
> using SAVEPOINT, the outer transaction is maintained. In the example
> below a list of data is persisted into the database, with the occasional
> "duplicate primary key" record skipped, without rolling back the entire
> operation:

However for PostgreSQL specifically, there is a better approach we can
do: use it's `ON CONFLICT DO NOTHING` approach. This also allows us to
do the whole process in a single SQL statement (vs 1 select + 1 insert
per for the slow path)

Report URL: https://github.com/apache/airflow/actions/runs/2972069866

With regards,
GitHub Actions via GitBox


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to