jihoonson commented on a change in pull request #10213:
URL: https://github.com/apache/druid/pull/10213#discussion_r594562633
##########
File path:
server/src/main/java/org/apache/druid/server/coordinator/CuratorLoadQueuePeon.java
##########
@@ -361,21 +386,33 @@ private void entryRemoved(SegmentHolder segmentHolder,
String path)
);
}
- private void failAssign(SegmentHolder segmentHolder)
+ private void failAssign(SegmentHolder segmentHolder, boolean handleTimeout)
{
- failAssign(segmentHolder, null);
+ failAssign(segmentHolder, handleTimeout, null);
}
- private void failAssign(SegmentHolder segmentHolder, Exception e)
+ private void failAssign(SegmentHolder segmentHolder, boolean handleTimeout,
Exception e)
{
if (e != null) {
log.error(e, "Server[%s], throwable caught when submitting [%s].",
basePath, segmentHolder);
Review comment:
> Alerting sounds like a good idea, but my concern is that since the
alert would happen per segment, a slowness on the historical side can generate
a large number of alerts for a fairly large cluster. What do you think?
I think it's a valid concern. We may be able to emit those exceptions in
bulk if they are thrown in a short time frame. I believe this should be done in
a separate PR even if we want, and thus my comment is not a blocker for this PR.
> Also as a followup PR I was planning to add the timedOut segment list to
the `/druid/coordinator/v1/loadqueue` along with some docs about its usage in
understanding the cluster behavior.
Thanks. It sounds good to me.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]