[
https://issues.apache.org/jira/browse/RATIS-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17795890#comment-17795890
]
Tsz-wo Sze commented on RATIS-1966:
-----------------------------------
I guess you also saw the following log many times?
{code}
// LeaderStateImpl.restart
LOG.info("{}: Restarting {} for {}", this,
JavaUtils.getClassSimpleName(sender.getClass()), info.getName());
{code}
bq. We can print the first exception of StatusRuntimeException (Unavailable IO)
and start batching n logs (print once) during exceptional periods. ...
Yes. If we can detect the log message being the same, we may even skip
printing it repeatedly. Just print it once a while (say every 5 seconds) and
also the count.
> Warning logs flooding when a peer is down
> -----------------------------------------
>
> Key: RATIS-1966
> URL: https://issues.apache.org/jira/browse/RATIS-1966
> Project: Ratis
> Issue Type: Improvement
> Components: gRPC
> Affects Versions: 3.0.0
> Reporter: Song Ziyang
> Assignee: Song Ziyang
> Priority: Minor
> Attachments: image-2023-12-12-11-02-46-776.png,
> image-2023-12-12-11-14-02-145.png
>
>
> Warnings like
> 2023-12-12 10:58:01,764 [grpc-default-executor-6] WARN
> o.a.ratis.util.LogUtils:121 -
> 2@group-000100000010->3-AppendLogResponseHandler: Failed appendEntries:
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io
> exception
> are flooding when there is a follower currently down.
> thousands of logs are printed in several tens of seconds.
> !image-2023-12-12-11-14-02-145.png!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)