[
https://issues.apache.org/jira/browse/GOBBLIN-1994?focusedWorklogId=902436&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-902436
]
ASF GitHub Bot logged work on GOBBLIN-1994:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 29/Jan/24 22:35
Start Date: 29/Jan/24 22:35
Worklog Time Spent: 10m
Work Description: phet opened a new pull request, #3870:
URL: https://github.com/apache/gobblin/pull/3870
Dear Gobblin maintainers,
Please accept this PR. I understand that it will not be reviewed until I
have checked off all the steps below!
### JIRA
- [ ] My PR addresses the following [Gobblin
JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references
them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR"
- https://issues.apache.org/jira/browse/GOBBLIN-1994
### Description
- [ ] Here are some details about my PR, including screenshots (if
applicable):
Iceberg-distcp replication currently risks introducing inconsistency to the
destination/replication-target table, because the current snapshot version of
the tables may change between the planning phase (of creating WorkUnits) and
the final post-copy commit to the destination table.
To inoculate against any concurrent commits, ensure correctness by
committing:
1. the same source-side metadata observed while first listing the source
table during the Planning Phase and
2. the same dest-side metadata observed just prior to that first listing of
the source table.
This guarantees that:
1. any later-modified source table metadata would not be committed to dest
(at least not until a subsequent execution)
2. any interim commit to the dest table would result in the distcp
replication commit failing (which may then be resolved by launching a new
iceberg distcp execution on the same table)
### Tests
- [ ] My PR adds the following unit tests __OR__ does not need testing for
this extremely good reason:
updated unit tests
### Commits
- [ ] My commits all reference JIRA issues in their subject lines, and I
have squashed multiple commits if they address the same issue. In addition, my
commits follow the guidelines from "[How to write a good git commit
message](http://chris.beams.io/posts/git-commit/)":
1. Subject is separated from body by a blank line
4. Subject is limited to 50 characters
5. Subject does not end with a period
6. Subject uses the imperative mood ("add", not "adding")
7. Body wraps at 72 characters
8. Body explains "what" and "why", not "how"
Issue Time Tracking
-------------------
Worklog Id: (was: 902436)
Remaining Estimate: 0h
Time Spent: 10m
> Ensure iceberg-distcp replication consistency
> ---------------------------------------------
>
> Key: GOBBLIN-1994
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1994
> Project: Apache Gobblin
> Issue Type: Bug
> Components: gobblin-core
> Reporter: Kip Kohn
> Assignee: Abhishek Tiwari
> Priority: Major
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Ensure iceberg-distcp consistency by using same `TableMetadata` for both WU
> planning and final commit
--
This message was sent by Atlassian Jira
(v8.20.10#820010)