Marco Baldessari created GEODE-9906:
---------------------------------------

             Summary: Unable to reconnect a node after SO patching "15 seconds 
have elapsed while waiting for replies"
                 Key: GEODE-9906
                 URL: https://issues.apache.org/jira/browse/GEODE-9906
             Project: Geode
          Issue Type: Bug
            Reporter: Marco Baldessari


I have a cluster situation consisting of 4 total nodes, 3 servers and 1 
management node, working properly.

At the beginning of the month we planned to patch the OS and we started from 
the first server node with this procedure:

- Stop service
- S.O. patching
- Server restart
- Start service

The service of the first patched node named "serverA" fails to restart with 
this error:

Log entries cluster join:
serverA:
| INFO  | region-dm-12                 | ache.geode.internal.tcp.Connection | 
--> Connection: shared=true ordered=false failed to connect to peer 
10.237.110.195( Server serverB:9993)<ec><v127>:1024 because: 
java.net.ConnectException: Connection timed out (Connection timed out)
| WARN  | region-dm-12               | ache.geode.internal.tcp.Connection | --> 
Connection: Attempting reconnect to peer  10.237.110.195( Server 
serverB:9993)<ec><v127>:1024
 
ServerMgmt:
| WARN  | pool-3-thread-1              | tributed.internal.ReplyProcessor21     
| --> 15 seconds have elapsed while waiting for replies: 
<CreateRegionProcessor$CreateRegionReplyProcessor 44180 waiting for 1 replies 
from [10.237.110.194( Server serverA:632)<ec><v174>:1024]> on 10.237.110.225( 
Management:6033)<ec><v111>:1024 whose current membership list is: 
[[10.237.110.196( Server serverC:16805)<ec><v136>:1024, 10.237.110.225( 
Management:6033)<ec><v111>:1024, 10.237.110.195( Server 
serverB:9993)<ec><v127>:1024, 10.237.110.194( Server 
serverA:632)<ec><v174>:1024]]
 
The connection between the systems was verified with tcpdumps, udp 1024 is 
running fine.
 
We have tried redeploying the service and making numerous attempts but we 
always get the same error during startup.

Any idea? Thank you.

Marco.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to