panhongan edited a comment on pull request #11296:
URL: https://github.com/apache/druid/pull/11296#issuecomment-1023854823
> is this change (#12167) by any chance? We were not handling the pausing
tasks really well.
Your fix change is really a bug, but that is not the root cause.
In our production, when the ingestion task received pausing request, but due
to the high disk usage, then the "persist action" will last for long time(about
10 minutes), the the "pausing task future" will be timeout.
In `SeekableStreamSupervisor`:
```
this.futureTimeoutInSeconds = Math.max(
MINIMUM_FUTURE_TIMEOUT_IN_SECONDS,
tuningConfig.getChatRetries() *
(tuningConfig.getHttpTimeout().getStandardSeconds()
+
IndexTaskClient.MAX_RETRY_WAIT_SECONDS)
**(in our production, this value is about: max(120, 8 * (10s + 10s)) =
160s)**
checkTaskDuration():
**Futures.successfulAsList(futures).get(futureTimeoutInSeconds,
TimeUnit.SECONDS);**
```
And In `SeekableStreamIndexTaskClient::pause()`, even if you fix that bug,
need more than 3435s to break the **while**.
```
while (true) {
final Duration delay = retryPolicy.getAndIncrementRetryDelay();
if (delay == null) { // need 3435 seconds to become null
throw new ISE(
"Task [%s] failed to change its status from [%s] to [%s],
aborting",
id,
status,
SeekableStreamIndexTaskRunner.Status.PAUSED
);
}
}
```
So that is the problem: futureTimeout << pausingRetryDration.
Even if we reduce the delay duration or reduce the retry number, but that
will not help us.
I mean we need strict control for ingestion tasks, not dependent on the
timeout. So this is the goal of my change.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]