aiborodin commented on issue #14425:
URL: https://github.com/apache/iceberg/issues/14425#issuecomment-3484180614

   @pvary We can't use `conflictDetectionFilter()` because it operates on a 
per-record basis and doesn't expose the  `_file` column, which we would need to 
identify duplicate add/delete files coming from concurrent commits. Adding this 
column would also require changing the public API and result in a much more 
complex filter condition passed from the Flink job.
   All existing validation methods rely on predefined conditions and won't work.
   
   We need a simple check of the `Snapshot`'s summary to identify duplicate 
commits using the `flink.max-committed-checkpoint-id` property. There's no way 
to do this using the existing API.
   
   I understand the concern about changing the core API, and I am happy to get 
others' opinions on this. I raised an alternative solution, which has no 
modifications to the core APIs (apart from 1 line), and extends `BaseRowDelta` 
and `BaseReplacePartitions`: https://github.com/apache/iceberg/pull/14484. 
However, in this solution, we have to manually instantiate the subclasses: 
`FlinkRowDelta` and `FlinkReplacePartitions`, and enforce the presence of 
`HasTableOperations` to access `TableOperations`. In contrast, the first 
solution (https://github.com/apache/iceberg/pull/14445) cleanly retrieves these 
operations from the table API.
   
   I personally favour the first solution of adding new public validation 
methods (https://github.com/apache/iceberg/pull/14445), because it seems 
generic enough and can be useful for other applications where clients may want 
to have a custom validation, for example, using `Snapshot` properties. But I am 
also okay with the second option of inheriting from the public APIs.
   
   What do you think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to