[
https://issues.apache.org/jira/browse/GOBBLIN-2088?focusedWorklogId=923844&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-923844
]
ASF GitHub Bot logged work on GOBBLIN-2088:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 18/Jun/24 08:18
Start Date: 18/Jun/24 08:18
Worklog Time Spent: 10m
Work Description: phet commented on code in PR #3976:
URL: https://github.com/apache/gobblin/pull/3976#discussion_r1644032306
##########
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/iceberg/IcebergRegisterStep.java:
##########
@@ -85,11 +91,20 @@ public void execute() throws IOException {
if (!isJustPriorDestMetadataStillCurrent) {
throw new IOException("error: likely concurrent writing to
destination: " + determinationMsg);
}
- destIcebergTable.registerIcebergTable(readTimeSrcTableMetadata,
currentDestMetadata);
+ Retryer<Void> registerRetryer = RetryerBuilder.<Void>newBuilder()
+ .retryIfException()
+
.withStopStrategy(StopStrategies.stopAfterAttempt(MAX_NUMBER_OF_ATTEMPTS)).build();
+ registerRetryer.call(() -> {
+ destIcebergTable.registerIcebergTable(readTimeSrcTableMetadata,
currentDestMetadata);
+ return null;
+ });
} catch (IcebergTable.TableNotFoundException tnfe) {
String msg = "Destination table (with TableMetadata) does not exist: " +
tnfe.getMessage();
log.error(msg);
throw new IOException(msg, tnfe);
+ } catch (ExecutionException | RetryException e) {
Review Comment:
see the comment on line 90 about:
> org.apache.iceberg.exceptions.CommitFailedException: Cannot commit: stale
table metadata
we should probably detect that one and give up retrying (potentially with a
clearer error message).
moreover, `ExecutionException` and `RetryException` are separate
circumstances and probably shouldn't be clubbed into the same handler. see
[one example
here](https://github.com/apache/gobblin/blob/0d4ee241bef8d68f257f91ebd878e0b39698f68c/gobblin-service/src/main/java/org/apache/gobblin/service/monitoring/KafkaJobStatusMonitor.java#L248).
and, do unwrap before re-throwing.
##########
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/iceberg/IcebergRegisterStep.java:
##########
@@ -85,11 +91,20 @@ public void execute() throws IOException {
if (!isJustPriorDestMetadataStillCurrent) {
throw new IOException("error: likely concurrent writing to
destination: " + determinationMsg);
}
- destIcebergTable.registerIcebergTable(readTimeSrcTableMetadata,
currentDestMetadata);
+ Retryer<Void> registerRetryer = RetryerBuilder.<Void>newBuilder()
+ .retryIfException()
+
.withStopStrategy(StopStrategies.stopAfterAttempt(MAX_NUMBER_OF_ATTEMPTS)).build();
Review Comment:
in general, we try to avoid hard-coding config into our jobs. see
`KafkaJobStatusMonitor` for [an
example](https://github.com/apache/gobblin/blob/0d4ee241bef8d68f257f91ebd878e0b39698f68c/gobblin-service/src/main/java/org/apache/gobblin/service/monitoring/KafkaJobStatusMonitor.java#L138)
of using a config prefix, then *falling back to* some hard-coded defaults. in
addition, a `RetryListener` may be helpful toward leaving an indication in the
logs of why we're retrying
Issue Time Tracking
-------------------
Worklog Id: (was: 923844)
Time Spent: 50m (was: 40m)
> Add retries to OH replication final catalog commit
> --------------------------------------------------
>
> Key: GOBBLIN-2088
> URL: https://issues.apache.org/jira/browse/GOBBLIN-2088
> Project: Apache Gobblin
> Issue Type: Task
> Reporter: Vivek Rai
> Priority: Major
> Time Spent: 50m
> Remaining Estimate: 0h
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)