[jira] [Commented] (RATIS-556) Detect node failures and close the log to prevent additional writes

Josh Elser (Jira) Fri, 30 Aug 2019 10:20:15 -0700


    [ 
https://issues.apache.org/jira/browse/RATIS-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16919741#comment-16919741
 ]


Josh Elser commented on RATIS-556:
----------------------------------

{code:java}
                                    RaftClient client = RaftClient.newBuilder().
                                            
setRaftGroup(group).setProperties(properties).build();
                                    try {
                                        RaftClientReply reply = client.send(
                                                () -> 
LogServiceProtoUtil.toChangeStateRequestProto(logName, LogStream.State.CLOSED)
                                                        .toByteString()); {code}
We should get a big logger WARN message saying that we're closing the log.
{code:java}
+                                        LogServiceProtos.ChangeStateReplyProto 
message =
+                                                
LogServiceProtos.ChangeStateReplyProto.parseFrom(reply.getMessage().getContent());
 {code}
Should we be checking anything in this Reply?
{code:java}
+ final PeerGroups[] peerGroupsToRemove = new PeerGroups[1];
+ // remove peer from groups 
+ avail.stream().forEach(peerGroup -> { 
+     if(peerGroup.getPeer().equals(raftPeer)) { 
+         peerGroupsToRemove[0] = peerGroup; 
+     } 
+ }); 
+ if(peerGroupsToRemove.length > 0) { 
+     avail.remove(peerGroupsToRemove[0]); 
+ }{code}
This isn't quite right. {{(new PeerGroups[1]).length}} is always greater than 
0, but {{peerGroupsToRemove[0]}} may be null. Make this a List and just append 
(potentially) multiple {{PeerGroups}} to it?

The test you have is nice and concise!

Would it be possible to modify that test or add a new test which makes sure 
that the contents of each data structure we maintain are kept in sync? I am 
talking about {{map}}, {{peers}}, {{peerLogs}},{{heartbeatInfo}} and {{avail}}? 
However you think easiest to test it would be good. We wouldn't want these data 
structures to drift and become out of sync (as they would just leak memory).

> Detect node failures and close the log to prevent additional writes
> -------------------------------------------------------------------
>
>                 Key: RATIS-556
>                 URL: https://issues.apache.org/jira/browse/RATIS-556
>             Project: Ratis
>          Issue Type: Improvement
>            Reporter: Rajeshbabu Chintaguntla
>            Assignee: Rajeshbabu Chintaguntla
>            Priority: Major
>         Attachments: RATIS-556-wip.patch, RATIS-556_v1.patch, 
> RATIS-556_v2.patch, RATIS-556_v3.patch
>
>
> Currently there is no way to detect the node failures at master log servers 
> and add new nodes to the group serving the log. We need to analyze how Ozone 
> is working in this case.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Commented] (RATIS-556) Detect node failures and close the log to prevent additional writes

Reply via email to