nihal111 commented on issue #12558:
URL: https://github.com/apache/iceberg/issues/12558#issuecomment-3551162202

   
   I ended up bypassing Iceberg’s match-action semantics entirely.  
   Instead of relying on `WHEN MATCHED AND …` inside the MERGE (which can still 
rewrite rows even when the condition is false), I compute my own CDC labels 
**before** the MERGE with a FULL OUTER JOIN.
   
   The pattern looks like this:
   
   ----------
   
   ### **1. Compute CDC using a FULL OUTER JOIN**
   
   You join the incoming data with the target table on the primary keys and 
mark each row as Insert / Update / Delete / Retain.
   
   ```SQL
   WITH filtered_target AS (
       SELECT *
       FROM target_table
   ),
   
   cdc_marked AS (
       SELECT
           /* coalesced primary keys */
           COALESCE(target.pk1, source.pk1) AS pk1,
           COALESCE(target.pk2, source.pk2) AS pk2,
           ...
   
           /* carry source columns forward */
           source.col1,
           source.col2,
           ...
   
           /* CDC classifier */
           CASE
               WHEN source.pk1 IS NULL AND source.pk2 IS NULL
                   THEN 'D'  -- exists only in target
               WHEN target.pk1 IS NULL AND target.pk2 IS NULL
                   THEN 'I'  -- exists only in source
               WHEN <change_condition>
                   THEN 'U'  -- exists in both, and values differ
               ELSE 'R'      -- unchanged
           END AS cdc
       FROM filtered_target AS target
       FULL OUTER JOIN source_view AS source
           ON target.pk1 IS NOT DISTINCT FROM source.pk1
          AND target.pk2 IS NOT DISTINCT FROM source.pk2
   )
   SELECT * FROM cdc_marked;
   ```
   
   This generates exactly the rows you intend to act on, with an explicit CDC 
label:
   
   -   `'I'` insert
       
   -   `'U'` update
       
   -   `'D'` delete
       
   -   `'R'` retain / unchanged (these should **not** touch the table)
       
   
   ----------
   
   ### **2. Drive MERGE logic using the CDC label**
   
   I then filter the changes to remove the 'R' records. Now the MERGE becomes 
deterministic and avoids Iceberg’s implicit “emit all” behavior.
   
   ```SQL
   MERGE INTO target_table AS target
   USING cdc_marked AS changes
   ON target.pk1 IS NOT DISTINCT FROM changes.pk1
      AND target.pk2 IS NOT DISTINCT FROM changes.pk2
   
   WHEN MATCHED AND changes.cdc = 'U' THEN
       UPDATE SET
           col1 = changes.col1,
           col2 = changes.col2,
           ...
   
   WHEN MATCHED AND changes.cdc = 'D' THEN
       DELETE
   
   WHEN NOT MATCHED AND changes.cdc = 'I' THEN
       INSERT (pk1, pk2, col1, col2, ...)
       VALUES (changes.pk1, changes.pk2, changes.col1, changes.col2, ...);
   ```
   
   This effectively sidesteps the unintended rewrites described in the issue.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to