SameerMesiah97 opened a new pull request, #67045: URL: https://github.com/apache/airflow/pull/67045
**Description** This change adds a new `PostgresHook.upsert_rows` method that provides native PostgreSQL UPSERT support using `INSERT ... ON CONFLICT`. The new method supports configurable conflict targets through `conflict_fields` and selective updates through `update_fields`. When `update_fields` is omitted or empty, conflicting rows are ignored using `DO NOTHING`. `upsert_rows` reuses the existing batching, transaction handling, serialization, and lineage behavior used by `insert_rows`, while introducing PostgreSQL-specific UPSERT semantics that are not currently exposed through the generic insert abstraction. This PR is dependent on PR #66893 merging first. **Rationale** `DbApiHook.insert_rows` currently supports a generic `replace=True` abstraction delegated through dialect-specific SQL generation. However, PostgreSQL UPSERT semantics require additional concepts that are not representable through the existing API, including explicit conflict targets and selective update columns. Supporting PostgreSQL-native UPSERT behavior through `insert_rows` would require introducing PostgreSQL-specific arguments such as `conflict_fields` and `update_fields` into the shared public `DbApiHook.insert_rows` API. Since `DbApiHook` is inherited broadly across providers, expanding the generic insert abstraction with provider-specific UPSERT semantics would increase API complexity and introduce ambiguous behavior for non-PostgreSQL hooks. Adding a dedicated `PostgresHook.upsert_rows` method keeps PostgreSQL `ON CONFLICT` semantics explicit and self-contained while avoiding backwards compatibility and abstraction concerns in the shared `DbApiHook` interface. The implementation uses PostgreSQL-native `INSERT ... ON CONFLICT` semantics rather than `MERGE`, since `ON CONFLICT` is the established and more broadly compatible UPSERT mechanism across supported PostgreSQL versions. **Tests** Added unit tests verifying that: * Standard UPSERT operations correctly generate `ON CONFLICT DO UPDATE` SQL. * UPSERT operations correctly support single and composite conflict fields. * UPSERT operations correctly support single and multiple update fields. * `DO NOTHING` behavior is generated when `update_fields` is omitted. * `fast_executemany=True` uses `psycopg2.extras.execute_batch`. * `commit_every` correctly chunks UPSERT operations across transactions. * Empty row collections do not generate SQL or emit lineage. * Empty or invalid `target_fields` and `conflict_fields` raise validation errors. **Backwards Compatibility** This change introduces a new provider-specific API and does not modify existing `insert_rows` behavior or shared `DbApiHook` interfaces. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
