Copilot commented on code in PR #59760:
URL: https://github.com/apache/doris/pull/59760#discussion_r2681117255
##########
fe/fe-core/src/main/java/org/apache/doris/job/extensions/insert/streaming/StreamingInsertJob.java:
##########
@@ -516,10 +516,18 @@ protected void fetchMeta() {
offsetProvider.fetchRemoteMeta(new HashMap<>());
}
} catch (Exception ex) {
- //todo: The job status = MANUAL_PAUSE_ERR, No need to set
failureReason again
log.warn("fetch remote meta failed, job id: {}", getJobId(), ex);
- failureReason = new
FailureReason(InternalErrorCode.GET_REMOTE_DATA_ERROR,
- "Failed to fetch meta, " + ex.getMessage());
+ if (this.getFailureReason() != null
+ &&
!InternalErrorCode.MANUAL_PAUSE_ERR.equals(this.getFailureReason().getCode())) {
Review Comment:
The condition logic is inverted. This checks if failureReason is NOT null
and is NOT MANUAL_PAUSE_ERR, which means it will skip setting the failure
reason when there's already an error. The logic should be checking if
failureReason is null OR if it's not MANUAL_PAUSE_ERR. The correct condition
should be: `if (this.getFailureReason() == null ||
!InternalErrorCode.MANUAL_PAUSE_ERR.equals(this.getFailureReason().getCode()))`.
Otherwise, legitimate failures after an initial error will be silently ignored.
```suggestion
if (this.getFailureReason() == null
||
!InternalErrorCode.MANUAL_PAUSE_ERR.equals(this.getFailureReason().getCode())) {
```
##########
fe/fe-core/src/main/java/org/apache/doris/job/extensions/insert/streaming/StreamingInsertJob.java:
##########
@@ -505,7 +505,7 @@ public List<AbstractStreamingTask> queryAllStreamTasks() {
return tasks;
}
- protected void fetchMeta() {
+ protected void fetchMeta() throws JobException {
Review Comment:
The method signature declares that it throws JobException, but all
exceptions are caught internally and handled. The method never actually throws
JobException. Either remove the throws clause from the method signature, or
propagate the exception instead of catching it. This misleading signature can
confuse callers about the method's behavior.
```suggestion
protected void fetchMeta() {
```
##########
fe/fe-core/src/main/java/org/apache/doris/job/extensions/insert/streaming/StreamingInsertJob.java:
##########
@@ -516,10 +516,18 @@ protected void fetchMeta() {
offsetProvider.fetchRemoteMeta(new HashMap<>());
}
} catch (Exception ex) {
- //todo: The job status = MANUAL_PAUSE_ERR, No need to set
failureReason again
log.warn("fetch remote meta failed, job id: {}", getJobId(), ex);
- failureReason = new
FailureReason(InternalErrorCode.GET_REMOTE_DATA_ERROR,
- "Failed to fetch meta, " + ex.getMessage());
+ if (this.getFailureReason() != null
+ &&
!InternalErrorCode.MANUAL_PAUSE_ERR.equals(this.getFailureReason().getCode())) {
+ // When a job is manually paused, it does not need to be set
again,
+ // otherwise, it may be woken up by auto resume.
+ this.setFailureReason(
+ new
FailureReason(InternalErrorCode.GET_REMOTE_DATA_ERROR,
+ "Failed to fetch meta, " + ex.getMessage()));
+ // If fetching meta fails, the job is paused
+ // and auto resume will automatically wake it up.
+ this.updateJobStatus(JobStatus.PAUSED);
+ }
Review Comment:
The new error handling logic for fetch meta failures lacks test coverage.
This code path involves complex conditional logic around failure reasons and
auto-resume behavior that should be verified with unit tests. Consider adding
tests that cover scenarios like: 1) fetch meta fails when no prior failure
exists, 2) fetch meta fails when MANUAL_PAUSE_ERR is already set, 3) fetch meta
fails when a different error code is already set.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]