jasonf20 opened a new pull request, #14023:
URL: https://github.com/apache/iceberg/pull/14023

   In distributed computing use cases, you may have a job that should add data 
to the table only once. However, due to the nature of distributed computing the 
job may execute more than once once difference instances (Server A, Server B).
   Each server will write it’s own parquet files (eg: `<job_id>/A.parquet`, 
`<job_id>/B.parquet `). The won’t use the same file name to avoid issues if the 
files are not exact binary matches.
   
   When the jobs is about to commit data it should validate that no other files 
for the same job id are already committed.
   The above is just and example and the types of validations can differ 
depending on specific writer patterns. To solve this for any writer pattern 
this PR adds support for custom `commitValidators` that will allow users to set 
any validation logic on the base table metadata on which the change will be 
applied.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to