Thanks Steven for sharing this. I will go through this. But just wanted to point out that my question does not involve multiple table or multiple commits. In terms of iceberg I still do a single merge operation which completes my operation. So the basic pseudocode looks like this.
step1 - read table data, do some transformations, checkpoint using spark. step2 - commit to table using merge query. Iceberg's OCC guarantees me that if the snapshot has changed after step2 has started and it has conflicting files then the merge query fails, else it goes through. My requirement is to make the entire process fail if the snapshot has changed ( and has conflicting files ) after the table has been read initially in step1. Hence I was thinking of an approach of reading the current snapshot at the beginning of step1 and passing it as "validate-from-snapshot-id" as a merge query config. In fact, If I was using overwrite partitions, the exact same thing is possible. I debugged code and found that the behavior is because of https://github.com/apache/iceberg/blob/1.10.x/spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/source/SparkWrite.java#L371. The question is can I achieve the same for merge query? On Fri, Mar 27, 2026 at 9:17 PM Steven Wu <[email protected]> wrote: > Somesh, it seems you are looking for multi-statement transaction behavior, > which the community has discussed before. > > * desgin doc: > https://docs.google.com/document/d/1jr4Ah8oceOmo6fwxG_0II4vKDUHUKScb/edit > * dev thread: > https://lists.apache.org/thread/q7vgnfwdxng5q6mq45m0psghzy7553r7 > > > On Fri, Mar 27, 2026 at 8:02 AM Somesh Dhal <[email protected]> > wrote: > >> Can someone confirm if there's any way I can set >> "validate_from_snapshot_id" for iceberg merge query? By default it picks up >> the latest snapshot id of a table, but as I have a multi step process ( >> merge being the last step ), I want to set this explicitly. Was expecting >> to set this as options as part of mergeinto dataframe api, but don't see >> that option. I have tried setting this in sql conf for current thread but >> this doesn't work. Upon debugging more, I could see that this is only read >> from write conf only for certain actions like overwrite partitions, dynamic >> overwrite etc. Is there no absolute way to achieve this for merge query/api? >> >
