jasonf20 opened a new pull request, #14023: URL: https://github.com/apache/iceberg/pull/14023
In distributed computing use cases, you may have a job that should add data to the table only once. However, due to the nature of distributed computing the job may execute more than once once difference instances (Server A, Server B). Each server will write it’s own parquet files (eg: `<job_id>/A.parquet`, `<job_id>/B.parquet `). The won’t use the same file name to avoid issues if the files are not exact binary matches. When the jobs is about to commit data it should validate that no other files for the same job id are already committed. The above is just and example and the types of validations can differ depending on specific writer patterns. To solve this for any writer pattern this PR adds support for custom `commitValidators` that will allow users to set any validation logic on the base table metadata on which the change will be applied. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org