[
https://issues.apache.org/jira/browse/GEODE-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17466018#comment-17466018
]
Marco Baldessari commented on GEODE-9906:
-----------------------------------------
The Geode version is: {*}1.0.0-incubating{*}. Below full log of involved nodes.
*serverA*
2021-12-21T10:48:37,572 | INFO | region-dm-12 |
stributed.internal.InternalLocator | --> Starting peer location for
Distribution Locator on serverA/10.237.110.194[0]
2021-12-21T10:48:37,575 | INFO | region-dm-12 |
stributed.internal.InternalLocator | --> Starting Distribution Locator on
serverA/10.237.110.194[0]
2021-12-21T10:48:37,575 | INFO | region-dm-12 |
buted.internal.tcpserver.TcpServer | --> Locator was created at Tue Dec 21
10:48:37 CET 2021
2021-12-21T10:48:37,575 | INFO | region-dm-12 |
buted.internal.tcpserver.TcpServer | --> Listening on port 2222 bound on
address serverA/10.237.110.194
2021-12-21T10:48:37,576 | INFO | region-dm-12 |
.membership.gms.locator.GMSLocator | --> GemFire peer location service
starting. Other locators:
serverA[2222],serverB[2222],serverC[2222],Management[2424] Locators preferred
as coordinators: true Network partition detection enabled: true View
persistence file: locator2222view.dat
2021-12-21T10:48:37,576 | INFO | region-dm-12 |
.membership.gms.locator.GMSLocator | --> Peer locator attempting to recover
from serverA/10.237.110.194:2222
2021-12-21T10:48:37,881 | INFO | region-dm-12 |
.membership.gms.locator.GMSLocator | --> Peer locator was unable to recover
state from this locator
2021-12-21T10:48:37,881 | INFO | region-dm-12 |
.membership.gms.locator.GMSLocator | --> Peer locator attempting to recover
from serverB/10.237.110.195:2222
2021-12-21T10:48:38,134 | INFO | region-dm-12 |
.membership.gms.locator.GMSLocator | --> Peer locator recovered initial
membership of View[10.237.110.225( Management:6033)<ec><v111>:1024|185]
members: [10.237.110.225( Management:6033)<ec><v111>:1024\{lead},
10.237.110.195( Server serverB:9993)<ec><v127>:1024, 10.237.110.196( Server
serverC:16805)<ec><v136>:1024] crashed: [10.237.110.194( Server
serverA:636)<ec><v184>:1024]
2021-12-21T10:48:38,134 | INFO | region-dm-12 |
.membership.gms.locator.GMSLocator | --> Peer locator recovered state from
serverB/10.237.110.195:2222
2021-12-21T10:48:38,146 | INFO | region-dm-12 |
buted.internal.DistributionManager | --> Serial Queue info : THROTTLE_PERCENT:
0.75 SERIAL_QUEUE_BYTE_LIMIT :41943040 SERIAL_QUEUE_THROTTLE :31457280
TOTAL_SERIAL_QUEUE_BYTE_LIMIT :83886080 TOTAL_SERIAL_QUEUE_THROTTLE :31457280
SERIAL_QUEUE_SIZE_LIMIT :20000 SERIAL_QUEUE_SIZE_THROTTLE :15000
2021-12-21T10:48:38,387 | INFO | region-dm-12 |
.membership.gms.locator.GMSLocator | --> Peer locator is connecting to local
membership services
2021-12-21T10:48:38,387 | INFO | region-dm-12 |
d.internal.membership.gms.Services | --> Starting membership services
2021-12-21T10:48:38,463 | INFO | region-dm-12 |
d.internal.membership.gms.Services | --> JGroups channel created (took 76ms)
2021-12-21T10:48:38,479 | INFO | region-dm-12 |
uted.internal.direct.DirectChannel | --> GemFire P2P Listener started on null
2021-12-21T10:48:38,479 | INFO | region-dm-12 |
d.internal.membership.gms.Services | --> This member is hosting a locator will
be preferred as a membership coordinator
2021-12-21T10:48:38,482 | INFO | re Detection Server thread 0 |
d.internal.membership.gms.Services | --> Started failure detection server
thread on /10.237.110.194:19535.
2021-12-21T10:48:39,333 | INFO | region-dm-12 |
d.internal.membership.gms.Services | --> Attempting to join the distributed
system through coordinator 10.237.110.225( Management:6033)<ec><v111>:1024
using address 10.237.110.194( Server serverA:10242)<ec>:1024
2021-12-21T10:48:39,649 | INFO | t receiver,serverA-33868 |
d.internal.membership.gms.Services | --> received new view:
View[10.237.110.225( Management:6033)<ec><v111>:1024|186] members:
[10.237.110.225( Management:6033)<ec><v111>:1024\{lead}, 10.237.110.195( Server
serverB:9993)<ec><v127>:1024, 10.237.110.196( Server
serverC:16805)<ec><v136>:1024, 10.237.110.194( Server
serverA:10242)<ec><v186>:1024]
old view is: null
2021-12-21T10:48:39,650 | INFO | t receiver,serverA-33868 |
.membership.gms.locator.GMSLocator | --> Peer locator received new membership
view: View[10.237.110.225( Management:6033)<ec><v111>:1024|186] members:
[10.237.110.225( Management:6033)<ec><v111>:1024\{lead}, 10.237.110.195( Server
serverB:9993)<ec><v127>:1024, 10.237.110.196( Server
serverC:16805)<ec><v136>:1024, 10.237.110.194( Server
serverA:10242)<ec><v186>:1024]
2021-12-21T10:48:39,656 | INFO | region-dm-12 |
d.internal.membership.gms.Services | --> Joined the distributed system (took
1'170 ms)
2021-12-21T10:48:39,656 | INFO | region-dm-12 |
d.internal.membership.gms.Services | --> Finished joining (took 1170ms).
2021-12-21T10:48:39,657 | INFO | region-dm-12 |
buted.internal.DistributionManager | --> Starting DistributionManager
10.237.110.194( Server serverA:10242)<ec><v186>:1024. (took 1491 ms)
2021-12-21T10:48:39,659 | INFO | region-dm-12 |
buted.internal.DistributionManager | --> Initial (distribution manager) view =
View[10.237.110.225( Management:6033)<ec><v111>:1024|186] members:
[10.237.110.225( Management:6033)<ec><v111>:1024\{lead}, 10.237.110.195( Server
serverB:9993)<ec><v127>:1024, 10.237.110.196( Server
serverC:16805)<ec><v136>:1024, 10.237.110.194( Server
serverA:10242)<ec><v186>:1024]
2021-12-21T10:48:39,659 | INFO | region-dm-12 |
buted.internal.DistributionManager | --> Admitting member <10.237.110.225(
Management:6033)<ec><v111>:1024>. Now there are 1 non-admin member(s).
2021-12-21T10:48:39,659 | INFO | region-dm-12 |
buted.internal.DistributionManager | --> Admitting member <10.237.110.195(
Server serverB:9993)<ec><v127>:1024>. Now there are 2 non-admin member(s).
2021-12-21T10:48:39,659 | INFO | region-dm-12 |
buted.internal.DistributionManager | --> Admitting member <10.237.110.196(
Server serverC:16805)<ec><v136>:1024>. Now there are 3 non-admin member(s).
2021-12-21T10:48:39,659 | INFO | region-dm-12 |
buted.internal.DistributionManager | --> Admitting member <10.237.110.194(
Server serverA:10242)<ec><v186>:1024>. Now there are 4 non-admin member(s).
2021-12-21T10:50:49,787 | INFO | region-dm-12 |
ache.geode.internal.tcp.Connection | --> Connection: shared=true ordered=false
failed to connect to peer 10.237.110.195( Server serverB:9993)<ec><v127>:1024
because: java.net.ConnectException: Connection timed out (Connection timed out)
2021-12-21T10:50:51,797 | WARN | region-dm-12 |
ache.geode.internal.tcp.Connection | --> Connection: Attempting reconnect to
peer 10.237.110.195( Server serverB:9993)<ec><v127>:1024
*serverB*
2021-12-21T10:48:39,657 | INFO | t receiver,serverB-42516 |
d.internal.membership.gms.Services | --> received new view:
View[10.237.110.225( Management:6033)<ec><v111>:1024|186] members:
[10.237.110.225( Management:6033)<ec><v111>:1024\{lead}, 10.237.110.195( Server
serverB:9993)<ec><v127>:1024, 10.237.110.196( Server
serverC:16805)<ec><v136>:1024, 10.237.110.194( Server
serverA:10242)<ec><v186>:1024]
old view is: View[10.237.110.225( Management:6033)<ec><v111>:1024|185] members:
[10.237.110.225( Management:6033)<ec><v111>:1024\{lead}, 10.237.110.195( Server
serverB:9993)<ec><v127>:1024, 10.237.110.196( Server
serverC:16805)<ec><v136>:1024] crashed: [10.237.110.194( Server
serverA:636)<ec><v184>:1024]
2021-12-21T10:48:39,661 | INFO | t receiver,serverB-42516 |
.membership.gms.locator.GMSLocator | --> Peer locator received new membership
view: View[10.237.110.225( Management:6033)<ec><v111>:1024|186] members:
[10.237.110.225( Management:6033)<ec><v111>:1024\{lead}, 10.237.110.195( Server
serverB:9993)<ec><v127>:1024, 10.237.110.196( Server
serverC:16805)<ec><v136>:1024, 10.237.110.194( Server
serverA:10242)<ec><v186>:1024]
2021-12-21T10:48:39,662 | INFO | View Message Processor |
d.internal.membership.gms.Services | --> Membership: Processing addition <
10.237.110.194( Server serverA:10242)<ec><v186>:1024 >
2021-12-21T10:48:39,662 | INFO | View Message Processor |
buted.internal.DistributionManager | --> Admitting member <10.237.110.194(
Server serverA:10242)<ec><v186>:1024>. Now there are 4 non-admin member(s).
*Management*
2021-12-21T10:48:39,342 | INFO | st receiver,Management-46835 |
d.internal.membership.gms.Services | --> received join request from
10.237.110.194( Server serverA:10242)<ec>:1024
2021-12-21T10:48:39,647 | INFO | eode Membership View Creator |
d.internal.membership.gms.Services | --> View Creator is processing 1 requests
for the next membership view
2021-12-21T10:48:39,648 | INFO | eode Membership View Creator |
d.internal.membership.gms.Services | --> preparing new view
View[10.237.110.225( Management:6033)<ec><v111>:1024|186] members:
[10.237.110.225( Management:6033)<ec><v111>:1024\{lead}, 10.237.110.195( Server
serverB:9993)<ec><v127>:1024, 10.237.110.196( Server
serverC:16805)<ec><v136>:1024, 10.237.110.194( Server
serverA:10242)<ec><v186>:1024]
failure detection ports: 41363 30800 1768 19535
2021-12-21T10:48:39,656 | INFO | eode Membership View Creator |
d.internal.membership.gms.Services | --> finished waiting for responses to view
preparation
2021-12-21T10:48:39,657 | INFO | eode Membership View Creator |
d.internal.membership.gms.Services | --> received new view:
View[10.237.110.225( Management:6033)<ec><v111>:1024|186] members:
[10.237.110.225( Management:6033)<ec><v111>:1024\{lead}, 10.237.110.195( Server
serverB:9993)<ec><v127>:1024, 10.237.110.196( Server
serverC:16805)<ec><v136>:1024, 10.237.110.194( Server
serverA:10242)<ec><v186>:1024]
old view is: View[10.237.110.225( Management:6033)<ec><v111>:1024|185] members:
[10.237.110.225( Management:6033)<ec><v111>:1024\{lead}, 10.237.110.195( Server
serverB:9993)<ec><v127>:1024, 10.237.110.196( Server
serverC:16805)<ec><v136>:1024] crashed: [10.237.110.194( Server
serverA:636)<ec><v184>:1024]
2021-12-21T10:48:39,657 | INFO | eode Membership View Creator |
.membership.gms.locator.GMSLocator | --> Peer locator received new membership
view: View[10.237.110.225( Management:6033)<ec><v111>:1024|186] members:
[10.237.110.225( Management:6033)<ec><v111>:1024\{lead}, 10.237.110.195( Server
serverB:9993)<ec><v127>:1024, 10.237.110.196( Server
serverC:16805)<ec><v136>:1024, 10.237.110.194( Server
serverA:10242)<ec><v186>:1024]
2021-12-21T10:48:39,657 | INFO | eode Membership View Creator |
d.internal.membership.gms.Services | --> sending new view View[10.237.110.225(
Management:6033)<ec><v111>:1024|186] members: [10.237.110.225(
Management:6033)<ec><v111>:1024\{lead}, 10.237.110.195( Server
serverB:9993)<ec><v127>:1024, 10.237.110.196( Server
serverC:16805)<ec><v136>:1024, 10.237.110.194( Server
serverA:10242)<ec><v186>:1024]
failure detection ports: 41363 30800 1768 19535
2021-12-21T10:48:39,657 | INFO | View Message Processor |
d.internal.membership.gms.Services | --> Membership: Processing addition <
10.237.110.194( Server serverA:10242)<ec><v186>:1024 >
2021-12-21T10:48:39,657 | INFO | View Message Processor |
buted.internal.DistributionManager | --> Admitting member <10.237.110.194(
Server serverA:10242)<ec><v186>:1024>. Now there are 4 non-admin member(s).
2021-12-21T10:48:39,658 | INFO | pool-3-thread-2 |
e.internal.cache.DistributedRegion | --> Initializing region
_monitoringRegion_10.237.110.194<v186>1024
2021-12-21T10:48:54,657 | WARN | pool-3-thread-2 |
tributed.internal.ReplyProcessor21 | --> 15 seconds have elapsed while waiting
for replies: <CreateRegionProcessor$CreateRegionReplyProcessor 78988 waiting
for 1 replies from [10.237.110.194( Server serverA:10242)<ec><v186>:1024]> on
10.237.110.225( Management:6033)<ec><v111>:1024 whose current membership list
is: [[10.237.110.196( Server serverC:16805)<ec><v136>:1024, 10.237.110.225(
Management:6033)<ec><v111>:1024, 10.237.110.195( Server
serverB:9993)<ec><v127>:1024, 10.237.110.194( Server
serverA:10242)<ec><v186>:1024]]
> Unable to reconnect a node after SO patching "15 seconds have elapsed while
> waiting for replies"
> ------------------------------------------------------------------------------------------------
>
> Key: GEODE-9906
> URL: https://issues.apache.org/jira/browse/GEODE-9906
> Project: Geode
> Issue Type: Bug
> Reporter: Marco Baldessari
> Priority: Major
>
> I have a cluster situation consisting of 4 total nodes, 3 servers and 1
> management node, working properly.
> At the beginning of the month we planned to patch the OS and we started from
> the first server node with this procedure:
> - Stop service
> - S.O. patching
> - Server restart
> - Start service
> The service of the first patched node named "serverA" fails to restart with
> this error:
> Log entries cluster join:
> serverA:
> | INFO | region-dm-12 | ache.geode.internal.tcp.Connection |
> --> Connection: shared=true ordered=false failed to connect to peer
> 10.237.110.195( Server serverB:9993)<ec><v127>:1024 because:
> java.net.ConnectException: Connection timed out (Connection timed out)
> | WARN | region-dm-12 | ache.geode.internal.tcp.Connection |
> --> Connection: Attempting reconnect to peer 10.237.110.195( Server
> serverB:9993)<ec><v127>:1024
>
> ServerMgmt:
> | WARN | pool-3-thread-1 | tributed.internal.ReplyProcessor21
> | --> 15 seconds have elapsed while waiting for replies:
> <CreateRegionProcessor$CreateRegionReplyProcessor 44180 waiting for 1 replies
> from [10.237.110.194( Server serverA:632)<ec><v174>:1024]> on 10.237.110.225(
> Management:6033)<ec><v111>:1024 whose current membership list is:
> [[10.237.110.196( Server serverC:16805)<ec><v136>:1024, 10.237.110.225(
> Management:6033)<ec><v111>:1024, 10.237.110.195( Server
> serverB:9993)<ec><v127>:1024, 10.237.110.194( Server
> serverA:632)<ec><v174>:1024]]
>
> The connection between the systems was verified with tcpdumps, udp 1024 is
> running fine.
>
> We have tried redeploying the service and making numerous attempts but we
> always get the same error during startup.
> Any idea? Thank you.
> Marco.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)