[
https://issues.apache.org/jira/browse/RATIS-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16922783#comment-16922783
]
Josh Elser commented on RATIS-556:
----------------------------------
{code:java}
+ RaftClient client =
RaftClient.newBuilder().
+
setRaftGroup(group).setProperties(properties).build(); {code}
Need to close this RaftClient in PeerHealthChecker.
{code:java}
} catch (IOException e) {
LOG.error(
- "Exception while registring raft group with Metadata
Service during creation of log");
+ "Exception while registering raft group with Metadata
Service during creation of log");
e.printStackTrace(); {code}
Remove the {{printStackTrace}} and add {{e}} as an argument to the
{{LOG.error}}.
{code:java}
+ if((now - heartbeatTimestamp) >
failureDetectionPeriod) { {code}
Can you add a log message right away like...
{code:java}
LOG.warn("Closing all logs hosted by peer {} because last heartbeat ({}ms)
exceeds the threshold ({}ms)", raftPeer, now - heartbeatTimestamp,
failureDetectionPeriod);{code}
Otherwise, this looks good. Let me try it out on the docker-compose infra.
> Detect node failures and close the log to prevent additional writes
> -------------------------------------------------------------------
>
> Key: RATIS-556
> URL: https://issues.apache.org/jira/browse/RATIS-556
> Project: Ratis
> Issue Type: Improvement
> Reporter: Rajeshbabu Chintaguntla
> Assignee: Rajeshbabu Chintaguntla
> Priority: Major
> Attachments: RATIS-556-wip.patch, RATIS-556_v1.patch,
> RATIS-556_v2.patch, RATIS-556_v3.patch, RATIS-556_v4.patch
>
>
> Currently there is no way to detect the node failures at master log servers
> and add new nodes to the group serving the log. We need to analyze how Ozone
> is working in this case.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)