[ 
https://issues.apache.org/jira/browse/RATIS-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17795890#comment-17795890
 ] 

Tsz-wo Sze commented on RATIS-1966:
-----------------------------------

I guess you also saw the following log many times?
{code}
// LeaderStateImpl.restart
    LOG.info("{}: Restarting {} for {}", this, 
JavaUtils.getClassSimpleName(sender.getClass()), info.getName());
{code}

bq. We can print the first exception of StatusRuntimeException (Unavailable IO) 
and start batching n logs (print once) during exceptional periods. ...

Yes.  If we can detect the log message being the same, we may even skip 
printing it repeatedly.  Just print it once a while (say every 5 seconds) and 
also the count.


> Warning logs flooding when a peer is down
> -----------------------------------------
>
>                 Key: RATIS-1966
>                 URL: https://issues.apache.org/jira/browse/RATIS-1966
>             Project: Ratis
>          Issue Type: Improvement
>          Components: gRPC
>    Affects Versions: 3.0.0
>            Reporter: Song Ziyang
>            Assignee: Song Ziyang
>            Priority: Minor
>         Attachments: image-2023-12-12-11-02-46-776.png, 
> image-2023-12-12-11-14-02-145.png
>
>
> Warnings like 
> 2023-12-12 10:58:01,764 [grpc-default-executor-6] WARN  
> o.a.ratis.util.LogUtils:121 - 
> 2@group-000100000010->3-AppendLogResponseHandler: Failed appendEntries: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
> exception
> are flooding when there is a follower currently down.
> thousands of logs are printed in several tens of seconds.
> !image-2023-12-12-11-14-02-145.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to