[ 
https://issues.apache.org/jira/browse/GEODE-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17466018#comment-17466018
 ] 

Marco Baldessari commented on GEODE-9906:
-----------------------------------------

The Geode version is: {*}1.0.0-incubating{*}. Below full log of involved nodes.

*serverA*

2021-12-21T10:48:37,572 | INFO  | region-dm-12                 | 
stributed.internal.InternalLocator | --> Starting peer location for 
Distribution Locator on serverA/10.237.110.194[0]
2021-12-21T10:48:37,575 | INFO  | region-dm-12                 | 
stributed.internal.InternalLocator | --> Starting Distribution Locator on 
serverA/10.237.110.194[0]
2021-12-21T10:48:37,575 | INFO  | region-dm-12                 | 
buted.internal.tcpserver.TcpServer | --> Locator was created at Tue Dec 21 
10:48:37 CET 2021
2021-12-21T10:48:37,575 | INFO  | region-dm-12                 | 
buted.internal.tcpserver.TcpServer | --> Listening on port 2222 bound on 
address serverA/10.237.110.194
2021-12-21T10:48:37,576 | INFO  | region-dm-12                 | 
.membership.gms.locator.GMSLocator | --> GemFire peer location service 
starting.  Other locators: 
serverA[2222],serverB[2222],serverC[2222],Management[2424]  Locators preferred 
as coordinators: true  Network partition detection enabled: true  View 
persistence file: locator2222view.dat
2021-12-21T10:48:37,576 | INFO  | region-dm-12                 | 
.membership.gms.locator.GMSLocator | --> Peer locator attempting to recover 
from serverA/10.237.110.194:2222
2021-12-21T10:48:37,881 | INFO  | region-dm-12                 | 
.membership.gms.locator.GMSLocator | --> Peer locator was unable to recover 
state from this locator
2021-12-21T10:48:37,881 | INFO  | region-dm-12                 | 
.membership.gms.locator.GMSLocator | --> Peer locator attempting to recover 
from serverB/10.237.110.195:2222
2021-12-21T10:48:38,134 | INFO  | region-dm-12                 | 
.membership.gms.locator.GMSLocator | --> Peer locator recovered initial 
membership of View[10.237.110.225( Management:6033)<ec><v111>:1024|185] 
members: [10.237.110.225( Management:6033)<ec><v111>:1024\{lead}, 
10.237.110.195( Server serverB:9993)<ec><v127>:1024, 10.237.110.196( Server 
serverC:16805)<ec><v136>:1024]  crashed: [10.237.110.194( Server 
serverA:636)<ec><v184>:1024]
2021-12-21T10:48:38,134 | INFO  | region-dm-12                 | 
.membership.gms.locator.GMSLocator | --> Peer locator recovered state from 
serverB/10.237.110.195:2222
2021-12-21T10:48:38,146 | INFO  | region-dm-12                 | 
buted.internal.DistributionManager | --> Serial Queue info : THROTTLE_PERCENT: 
0.75 SERIAL_QUEUE_BYTE_LIMIT :41943040 SERIAL_QUEUE_THROTTLE :31457280 
TOTAL_SERIAL_QUEUE_BYTE_LIMIT :83886080 TOTAL_SERIAL_QUEUE_THROTTLE :31457280 
SERIAL_QUEUE_SIZE_LIMIT :20000 SERIAL_QUEUE_SIZE_THROTTLE :15000
2021-12-21T10:48:38,387 | INFO  | region-dm-12                 | 
.membership.gms.locator.GMSLocator | --> Peer locator is connecting to local 
membership services
2021-12-21T10:48:38,387 | INFO  | region-dm-12                 | 
d.internal.membership.gms.Services | --> Starting membership services
2021-12-21T10:48:38,463 | INFO  | region-dm-12                 | 
d.internal.membership.gms.Services | --> JGroups channel created (took 76ms)
2021-12-21T10:48:38,479 | INFO  | region-dm-12                 | 
uted.internal.direct.DirectChannel | --> GemFire P2P Listener started on  null
2021-12-21T10:48:38,479 | INFO  | region-dm-12                 | 
d.internal.membership.gms.Services | --> This member is hosting a locator will 
be preferred as a membership coordinator
2021-12-21T10:48:38,482 | INFO  | re Detection Server thread 0 | 
d.internal.membership.gms.Services | --> Started failure detection server 
thread on /10.237.110.194:19535.
2021-12-21T10:48:39,333 | INFO  | region-dm-12                 | 
d.internal.membership.gms.Services | --> Attempting to join the distributed 
system through coordinator 10.237.110.225( Management:6033)<ec><v111>:1024 
using address 10.237.110.194( Server serverA:10242)<ec>:1024
2021-12-21T10:48:39,649 | INFO  | t receiver,serverA-33868 | 
d.internal.membership.gms.Services | --> received new view: 
View[10.237.110.225( Management:6033)<ec><v111>:1024|186] members: 
[10.237.110.225( Management:6033)<ec><v111>:1024\{lead}, 10.237.110.195( Server 
serverB:9993)<ec><v127>:1024, 10.237.110.196( Server 
serverC:16805)<ec><v136>:1024, 10.237.110.194( Server 
serverA:10242)<ec><v186>:1024]
old view is: null
2021-12-21T10:48:39,650 | INFO  | t receiver,serverA-33868 | 
.membership.gms.locator.GMSLocator | --> Peer locator received new membership 
view: View[10.237.110.225( Management:6033)<ec><v111>:1024|186] members: 
[10.237.110.225( Management:6033)<ec><v111>:1024\{lead}, 10.237.110.195( Server 
serverB:9993)<ec><v127>:1024, 10.237.110.196( Server 
serverC:16805)<ec><v136>:1024, 10.237.110.194( Server 
serverA:10242)<ec><v186>:1024]
2021-12-21T10:48:39,656 | INFO  | region-dm-12                 | 
d.internal.membership.gms.Services | --> Joined the distributed system (took  
1'170  ms)
2021-12-21T10:48:39,656 | INFO  | region-dm-12                 | 
d.internal.membership.gms.Services | --> Finished joining (took 1170ms).
2021-12-21T10:48:39,657 | INFO  | region-dm-12                 | 
buted.internal.DistributionManager | --> Starting DistributionManager 
10.237.110.194( Server serverA:10242)<ec><v186>:1024.  (took 1491 ms)
2021-12-21T10:48:39,659 | INFO  | region-dm-12                 | 
buted.internal.DistributionManager | --> Initial (distribution manager) view =  
View[10.237.110.225( Management:6033)<ec><v111>:1024|186] members: 
[10.237.110.225( Management:6033)<ec><v111>:1024\{lead}, 10.237.110.195( Server 
serverB:9993)<ec><v127>:1024, 10.237.110.196( Server 
serverC:16805)<ec><v136>:1024, 10.237.110.194( Server 
serverA:10242)<ec><v186>:1024]
2021-12-21T10:48:39,659 | INFO  | region-dm-12                 | 
buted.internal.DistributionManager | --> Admitting member <10.237.110.225( 
Management:6033)<ec><v111>:1024>. Now there are 1 non-admin member(s).
2021-12-21T10:48:39,659 | INFO  | region-dm-12                 | 
buted.internal.DistributionManager | --> Admitting member <10.237.110.195( 
Server serverB:9993)<ec><v127>:1024>. Now there are 2 non-admin member(s).
2021-12-21T10:48:39,659 | INFO  | region-dm-12                 | 
buted.internal.DistributionManager | --> Admitting member <10.237.110.196( 
Server serverC:16805)<ec><v136>:1024>. Now there are 3 non-admin member(s).
2021-12-21T10:48:39,659 | INFO  | region-dm-12                 | 
buted.internal.DistributionManager | --> Admitting member <10.237.110.194( 
Server serverA:10242)<ec><v186>:1024>. Now there are 4 non-admin member(s).
2021-12-21T10:50:49,787 | INFO  | region-dm-12                 | 
ache.geode.internal.tcp.Connection | --> Connection: shared=true ordered=false 
failed to connect to peer 10.237.110.195( Server serverB:9993)<ec><v127>:1024 
because: java.net.ConnectException: Connection timed out (Connection timed out)
2021-12-21T10:50:51,797 | WARN  | region-dm-12                 | 
ache.geode.internal.tcp.Connection | --> Connection: Attempting reconnect to 
peer  10.237.110.195( Server serverB:9993)<ec><v127>:1024

*serverB*

2021-12-21T10:48:39,657 | INFO  | t receiver,serverB-42516 | 
d.internal.membership.gms.Services | --> received new view: 
View[10.237.110.225( Management:6033)<ec><v111>:1024|186] members: 
[10.237.110.225( Management:6033)<ec><v111>:1024\{lead}, 10.237.110.195( Server 
serverB:9993)<ec><v127>:1024, 10.237.110.196( Server 
serverC:16805)<ec><v136>:1024, 10.237.110.194( Server 
serverA:10242)<ec><v186>:1024]
old view is: View[10.237.110.225( Management:6033)<ec><v111>:1024|185] members: 
[10.237.110.225( Management:6033)<ec><v111>:1024\{lead}, 10.237.110.195( Server 
serverB:9993)<ec><v127>:1024, 10.237.110.196( Server 
serverC:16805)<ec><v136>:1024]  crashed: [10.237.110.194( Server 
serverA:636)<ec><v184>:1024]
2021-12-21T10:48:39,661 | INFO  | t receiver,serverB-42516 | 
.membership.gms.locator.GMSLocator | --> Peer locator received new membership 
view: View[10.237.110.225( Management:6033)<ec><v111>:1024|186] members: 
[10.237.110.225( Management:6033)<ec><v111>:1024\{lead}, 10.237.110.195( Server 
serverB:9993)<ec><v127>:1024, 10.237.110.196( Server 
serverC:16805)<ec><v136>:1024, 10.237.110.194( Server 
serverA:10242)<ec><v186>:1024]
2021-12-21T10:48:39,662 | INFO  | View Message Processor       | 
d.internal.membership.gms.Services | --> Membership: Processing addition < 
10.237.110.194( Server serverA:10242)<ec><v186>:1024 >
2021-12-21T10:48:39,662 | INFO  | View Message Processor       | 
buted.internal.DistributionManager | --> Admitting member <10.237.110.194( 
Server serverA:10242)<ec><v186>:1024>. Now there are 4 non-admin member(s).

*Management*

2021-12-21T10:48:39,342 | INFO  | st receiver,Management-46835 | 
d.internal.membership.gms.Services | --> received join request from 
10.237.110.194( Server serverA:10242)<ec>:1024
2021-12-21T10:48:39,647 | INFO  | eode Membership View Creator | 
d.internal.membership.gms.Services | --> View Creator is processing 1 requests 
for the next membership view
2021-12-21T10:48:39,648 | INFO  | eode Membership View Creator | 
d.internal.membership.gms.Services | --> preparing new view 
View[10.237.110.225( Management:6033)<ec><v111>:1024|186] members: 
[10.237.110.225( Management:6033)<ec><v111>:1024\{lead}, 10.237.110.195( Server 
serverB:9993)<ec><v127>:1024, 10.237.110.196( Server 
serverC:16805)<ec><v136>:1024, 10.237.110.194( Server 
serverA:10242)<ec><v186>:1024]
failure detection ports: 41363 30800 1768 19535
2021-12-21T10:48:39,656 | INFO  | eode Membership View Creator | 
d.internal.membership.gms.Services | --> finished waiting for responses to view 
preparation
2021-12-21T10:48:39,657 | INFO  | eode Membership View Creator | 
d.internal.membership.gms.Services | --> received new view: 
View[10.237.110.225( Management:6033)<ec><v111>:1024|186] members: 
[10.237.110.225( Management:6033)<ec><v111>:1024\{lead}, 10.237.110.195( Server 
serverB:9993)<ec><v127>:1024, 10.237.110.196( Server 
serverC:16805)<ec><v136>:1024, 10.237.110.194( Server 
serverA:10242)<ec><v186>:1024]
old view is: View[10.237.110.225( Management:6033)<ec><v111>:1024|185] members: 
[10.237.110.225( Management:6033)<ec><v111>:1024\{lead}, 10.237.110.195( Server 
serverB:9993)<ec><v127>:1024, 10.237.110.196( Server 
serverC:16805)<ec><v136>:1024]  crashed: [10.237.110.194( Server 
serverA:636)<ec><v184>:1024]
2021-12-21T10:48:39,657 | INFO  | eode Membership View Creator | 
.membership.gms.locator.GMSLocator | --> Peer locator received new membership 
view: View[10.237.110.225( Management:6033)<ec><v111>:1024|186] members: 
[10.237.110.225( Management:6033)<ec><v111>:1024\{lead}, 10.237.110.195( Server 
serverB:9993)<ec><v127>:1024, 10.237.110.196( Server 
serverC:16805)<ec><v136>:1024, 10.237.110.194( Server 
serverA:10242)<ec><v186>:1024]
2021-12-21T10:48:39,657 | INFO  | eode Membership View Creator | 
d.internal.membership.gms.Services | --> sending new view View[10.237.110.225( 
Management:6033)<ec><v111>:1024|186] members: [10.237.110.225( 
Management:6033)<ec><v111>:1024\{lead}, 10.237.110.195( Server 
serverB:9993)<ec><v127>:1024, 10.237.110.196( Server 
serverC:16805)<ec><v136>:1024, 10.237.110.194( Server 
serverA:10242)<ec><v186>:1024]
failure detection ports: 41363 30800 1768 19535
2021-12-21T10:48:39,657 | INFO  | View Message Processor       | 
d.internal.membership.gms.Services | --> Membership: Processing addition < 
10.237.110.194( Server serverA:10242)<ec><v186>:1024 >
2021-12-21T10:48:39,657 | INFO  | View Message Processor       | 
buted.internal.DistributionManager | --> Admitting member <10.237.110.194( 
Server serverA:10242)<ec><v186>:1024>. Now there are 4 non-admin member(s).
2021-12-21T10:48:39,658 | INFO  | pool-3-thread-2              | 
e.internal.cache.DistributedRegion | --> Initializing region 
_monitoringRegion_10.237.110.194<v186>1024
2021-12-21T10:48:54,657 | WARN  | pool-3-thread-2              | 
tributed.internal.ReplyProcessor21 | --> 15 seconds have elapsed while waiting 
for replies: <CreateRegionProcessor$CreateRegionReplyProcessor 78988 waiting 
for 1 replies from [10.237.110.194( Server serverA:10242)<ec><v186>:1024]> on 
10.237.110.225( Management:6033)<ec><v111>:1024 whose current membership list 
is: [[10.237.110.196( Server serverC:16805)<ec><v136>:1024, 10.237.110.225( 
Management:6033)<ec><v111>:1024, 10.237.110.195( Server 
serverB:9993)<ec><v127>:1024, 10.237.110.194( Server 
serverA:10242)<ec><v186>:1024]]

 

> Unable to reconnect a node after SO patching "15 seconds have elapsed while 
> waiting for replies"
> ------------------------------------------------------------------------------------------------
>
>                 Key: GEODE-9906
>                 URL: https://issues.apache.org/jira/browse/GEODE-9906
>             Project: Geode
>          Issue Type: Bug
>            Reporter: Marco Baldessari
>            Priority: Major
>
> I have a cluster situation consisting of 4 total nodes, 3 servers and 1 
> management node, working properly.
> At the beginning of the month we planned to patch the OS and we started from 
> the first server node with this procedure:
> - Stop service
> - S.O. patching
> - Server restart
> - Start service
> The service of the first patched node named "serverA" fails to restart with 
> this error:
> Log entries cluster join:
> serverA:
> | INFO  | region-dm-12                 | ache.geode.internal.tcp.Connection | 
> --> Connection: shared=true ordered=false failed to connect to peer 
> 10.237.110.195( Server serverB:9993)<ec><v127>:1024 because: 
> java.net.ConnectException: Connection timed out (Connection timed out)
> | WARN  | region-dm-12               | ache.geode.internal.tcp.Connection | 
> --> Connection: Attempting reconnect to peer  10.237.110.195( Server 
> serverB:9993)<ec><v127>:1024
>  
> ServerMgmt:
> | WARN  | pool-3-thread-1              | tributed.internal.ReplyProcessor21   
>   | --> 15 seconds have elapsed while waiting for replies: 
> <CreateRegionProcessor$CreateRegionReplyProcessor 44180 waiting for 1 replies 
> from [10.237.110.194( Server serverA:632)<ec><v174>:1024]> on 10.237.110.225( 
> Management:6033)<ec><v111>:1024 whose current membership list is: 
> [[10.237.110.196( Server serverC:16805)<ec><v136>:1024, 10.237.110.225( 
> Management:6033)<ec><v111>:1024, 10.237.110.195( Server 
> serverB:9993)<ec><v127>:1024, 10.237.110.194( Server 
> serverA:632)<ec><v174>:1024]]
>  
> The connection between the systems was verified with tcpdumps, udp 1024 is 
> running fine.
>  
> We have tried redeploying the service and making numerous attempts but we 
> always get the same error during startup.
> Any idea? Thank you.
> Marco.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to