rdblue commented on issue #6514:
URL: https://github.com/apache/iceberg/issues/6514#issuecomment-1369254153

   @fqaiser94, in general I think this is a good idea, but I'm not sure there 
are very many use cases for it besides deduplicating high-level operations. I 
also think that using this as a way to make table properties transactional is 
probably a bad idea, but it's been requested in the past so we should probably 
have an approved way to accomplish it.
   
   Table properties purposely don't have transactional guarantees, to avoid 
using them to coordinate state. Table properties are supposed to be used to 
configure the table, not to hold important state. What I recommend to 
accomplish the use case you're talking about is putting the watermark in 
snapshot properties instead of table properties. That's what we do for Flink 
commits and we get exactly-once behavior, although the check for the watermark 
is done outside of the commit path. Concurrent Flink writes would use different 
watermark properties because they use watermarks that are job-specific.
   
   It's a good idea to provide a custom validation that can do any check you 
want. For example, your Kafka example could create watermarks based on some 
chunk of time that is being processed and the custom validation could check the 
last few snapshots to see whether another process has already committed. That's 
a good use case.
   
   To do this, I'd probably take a slightly different approach than the one 
you've implemented. I'd add a `validate(Predicate<TableMetadata> current)` to 
either `SnapshotUpdate`, or the more general `PendingUpdate`. That way each 
table operation can have its own custom validation against the current table 
state. Using a transaction would automatically check all of the custom 
validations for each operation, so there would be no need to alter 
`Transaction`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to