kbendick commented on PR #4795:
URL: https://github.com/apache/iceberg/pull/4795#issuecomment-1133796964

   Hi @CodingCat 
   
   I’m trying to understand the situation you’re trying to solve for a bit more.
   
   > Because currentSnapshot() will trigger the refresh of metadata and may 
show the snapshot id committed by someone else in another concurrent thread 
eventually
   
   As mentioned on Slack, the metadata refresh on commit is to ensure that the 
state of the table is the same as it was when the write was prepared. This is 
how ACID compliance is achieved.
   
   I’m not sure I understand what you’re trying to achieve. I know you’d like 
to expose the snapshotId as it was when the current thread (or let’s just say 
writer) prepared it’s write, ie prior to the commit. But what do you intend to 
do with that information?
   
   > I think the scenario is more pervasive than our own case, e.g. each 
notebook attached to the Databricks' notebook cluster is basically handled by a 
thread. In such an scenario, users may fall into some race condition to get the 
snapshot id committed by their own notebook with just 
currentSnapshot().snapshotId
   
   What catalog are you using? You mention Databricks, and most people I’ve 
encountered using Iceberg on Databricks are using the `HadoopCatalog`. Which 
should _not_ be used in a production environment as there’s no locking 
mechanism to keep the current snapshot updateable by only one writer at a time 
(be it across threads or across Spark applications).
   
   It sounds like maybe you’re trying to get around the lack of a lock, but I 
worry that you’ll have conflicting writes and clobber the previous writers work.
   
   What do you intend to do with this thread local snapshot Id (particularly 
once it becomes outdated via some other writer).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to