LinBinJ commented on PR #10541:
URL:
https://github.com/apache/dolphinscheduler/pull/10541#issuecomment-1842025205
> > Why was this piece of code deleted? This deletion causes an issue where,
during retries, a new instance isn't generated. As a result, the logs of the
old instance become inaccessible. More critically, it fails to update the
'startTime', causing the newly retried instance to immediately time out and
fail upon startup. ` ```
> > ```
> > if (taskInstance.getState().typeIsFailure()) {
> > if (taskInstance.isSubProcess()) {
> > taskInstance.setRetryTimes(taskInstance.getRetryTimes() + 1);
> > } else {
> > if (processInstanceState != ExecutionStatus.READY_STOP
> > && processInstanceState !=
ExecutionStatus.READY_PAUSE) {
> > // failure task set invalid
> > taskInstance.setFlag(Flag.NO);
> > updateTaskInstance(taskInstance);
> > // crate new task instance
> > if (taskInstance.getState() !=
ExecutionStatus.NEED_FAULT_TOLERANCE) {
> >
taskInstance.setRetryTimes(taskInstance.getRetryTimes() + 1);
> > }
> > taskInstance.setSubmitTime(null);
> > taskInstance.setLogPath(null);
> > taskInstance.setExecutePath(null);
> > taskInstance.setStartTime(null);
> > taskInstance.setEndTime(null);
> > taskInstance.setFlag(Flag.YES);
> > taskInstance.setHost(null);
> > taskInstance.setId(0);
> > }
> > }
> > if (processInstanceState.typeIsFinished() || processInstanceState ==
ExecutionStatus.READY_STOP) {
> > logger.warn("processInstance {} was {}, skip submit task",
processInstance.getProcessDefinitionCode(), processInstanceState);
> > return null;
> > }
> > if (processInstanceState == ExecutionStatus.READY_PAUSE) {
> > taskInstance.setState(ExecutionStatus.PAUSE);
> > }
> > ```
>
> After my review, I don't think this is a bug. This is to modify the logic
of creating a new task instance from the previous retry to retry in the same
task instance, and the log will still not disappear in the same task instance.
This should not be modified with a new start time. This is missing a document
description.
Assume that the timeout interval of a task is 30 minutes, the number of
retry times is 3, and the retry interval is 10 minutes. When the first task
fails in 20 minutes, the subsequent two retry check times out. As a result, the
retry fails. Therefore, I think the startTime should be updated when the task
is retried.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]