ashb opened a new pull request, #26103:
URL: https://github.com/apache/airflow/pull/26103

   Previously this was "racey", so running multiple dags all updating the
   same outlet dataset (or how I ran in to this: mapped tasks) would cause
   some of them to fail with a unique constraing violation.
   
   The fix has two paths, one generic and an optimized version for
   Postgres.
   
   The generic one is likely slightly slower, and uses the pattern that the
   SQLA docs have for exactly this case. To quote
   
   > This pattern is ideal for situations such as using PostgreSQL and
   > catching IntegrityError to detect duplicate rows; PostgreSQL normally
   > aborts the entire tranasction when such an error is raised, however when
   > using SAVEPOINT, the outer transaction is maintained. In the example
   > below a list of data is persisted into the database, with the occasional
   > "duplicate primary key" record skipped, without rolling back the entire
   > operation:
   
   However for PostgreSQL specifically, there is a better approach we can
   do: use it's `ON CONFLICT DO NOTHING` approach. This also allows us to
   do the whole process in a single SQL statement (vs 1 select + 1 insert
   per for the slow path)
   
   Fixes #25210


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to