[ 
https://issues.apache.org/jira/browse/RATIS-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16922783#comment-16922783
 ] 

Josh Elser commented on RATIS-556:
----------------------------------

{code:java}
+                                    RaftClient client = 
RaftClient.newBuilder().
+                                            
setRaftGroup(group).setProperties(properties).build(); {code}
Need to close this RaftClient in PeerHealthChecker.
{code:java}
                 } catch (IOException e) {
                     LOG.error(
-                        "Exception while registring raft group with Metadata 
Service during creation of log");
+                        "Exception while registering raft group with Metadata 
Service during creation of log");
                     e.printStackTrace(); {code}
Remove the {{printStackTrace}} and add {{e}} as an argument to the 
{{LOG.error}}.
{code:java}
+                        if((now - heartbeatTimestamp) > 
failureDetectionPeriod) { {code}
Can you add a log message right away like...
{code:java}
LOG.warn("Closing all logs hosted by peer {} because last heartbeat ({}ms) 
exceeds the threshold ({}ms)", raftPeer, now - heartbeatTimestamp, 
failureDetectionPeriod);{code}
Otherwise, this looks good. Let me try it out on the docker-compose infra.

> Detect node failures and close the log to prevent additional writes
> -------------------------------------------------------------------
>
>                 Key: RATIS-556
>                 URL: https://issues.apache.org/jira/browse/RATIS-556
>             Project: Ratis
>          Issue Type: Improvement
>            Reporter: Rajeshbabu Chintaguntla
>            Assignee: Rajeshbabu Chintaguntla
>            Priority: Major
>         Attachments: RATIS-556-wip.patch, RATIS-556_v1.patch, 
> RATIS-556_v2.patch, RATIS-556_v3.patch, RATIS-556_v4.patch
>
>
> Currently there is no way to detect the node failures at master log servers 
> and add new nodes to the group serving the log. We need to analyze how Ozone 
> is working in this case.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to