[
https://issues.apache.org/jira/browse/RATIS-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16919741#comment-16919741
]
Josh Elser commented on RATIS-556:
----------------------------------
{code:java}
RaftClient client = RaftClient.newBuilder().
setRaftGroup(group).setProperties(properties).build();
try {
RaftClientReply reply = client.send(
() ->
LogServiceProtoUtil.toChangeStateRequestProto(logName, LogStream.State.CLOSED)
.toByteString()); {code}
We should get a big logger WARN message saying that we're closing the log.
{code:java}
+ LogServiceProtos.ChangeStateReplyProto
message =
+
LogServiceProtos.ChangeStateReplyProto.parseFrom(reply.getMessage().getContent());
{code}
Should we be checking anything in this Reply?
{code:java}
+ final PeerGroups[] peerGroupsToRemove = new PeerGroups[1];
+ // remove peer from groups
+ avail.stream().forEach(peerGroup -> {
+ if(peerGroup.getPeer().equals(raftPeer)) {
+ peerGroupsToRemove[0] = peerGroup;
+ }
+ });
+ if(peerGroupsToRemove.length > 0) {
+ avail.remove(peerGroupsToRemove[0]);
+ }{code}
This isn't quite right. {{(new PeerGroups[1]).length}} is always greater than
0, but {{peerGroupsToRemove[0]}} may be null. Make this a List and just append
(potentially) multiple {{PeerGroups}} to it?
The test you have is nice and concise!
Would it be possible to modify that test or add a new test which makes sure
that the contents of each data structure we maintain are kept in sync? I am
talking about {{map}}, {{peers}}, {{peerLogs}},{{heartbeatInfo}} and {{avail}}?
However you think easiest to test it would be good. We wouldn't want these data
structures to drift and become out of sync (as they would just leak memory).
> Detect node failures and close the log to prevent additional writes
> -------------------------------------------------------------------
>
> Key: RATIS-556
> URL: https://issues.apache.org/jira/browse/RATIS-556
> Project: Ratis
> Issue Type: Improvement
> Reporter: Rajeshbabu Chintaguntla
> Assignee: Rajeshbabu Chintaguntla
> Priority: Major
> Attachments: RATIS-556-wip.patch, RATIS-556_v1.patch,
> RATIS-556_v2.patch, RATIS-556_v3.patch
>
>
> Currently there is no way to detect the node failures at master log servers
> and add new nodes to the group serving the log. We need to analyze how Ozone
> is working in this case.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)