[ 
https://issues.apache.org/jira/browse/GEODE-1992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Baker closed GEODE-1992.
--------------------------------

> When using locator-wait-time and cluster configuration causes error on server 
> start.
> ------------------------------------------------------------------------------------
>
>                 Key: GEODE-1992
>                 URL: https://issues.apache.org/jira/browse/GEODE-1992
>             Project: Geode
>          Issue Type: Bug
>          Components: configuration
>            Reporter: Jens Deppe
>            Priority: Major
>
> I'm trying to speed up a cluster start up by using the locator-wait-time 
> property and launching all members asynchronously. Unfortunately there seems 
> to be a race when cluster configuration is enabled (which it is by default).
> On the server the logs show the following:
> {noformat}
> [info 2016/10/11 21:40:02.229 UTC test-cluster-server-0 <main> tid=0x1] 
> Starting membership services
> [info 2016/10/11 21:40:02.658 UTC test-cluster-server-0 <main> tid=0x1] 
> JGroups channel created (took 429ms)
> [info 2016/10/11 21:40:02.727 UTC test-cluster-server-0 <main> tid=0x1] 
> GemFire P2P Listener started on  null
> [info 2016/10/11 21:40:02.734 UTC test-cluster-server-0 <Geode Failure 
> Detection Server thread 0> tid=0x20] Started failure detection server thread 
> on /172.17.0.3:51937.
> [info 2016/10/11 21:40:04.374 UTC test-cluster-server-0 <main> tid=0x1] 
> Attempting to join the distributed system through coordinator 
> 172.17.0.2(test-cluster-locator-0:62:locator)<ec>:1024 using address 
> 172.17.0.3(test-cluster-server-0:34):1024
> [info 2016/10/11 21:40:04.749 UTC test-cluster-server-0 <unicast 
> receiver,test-cluster-server-0-22057> tid=0x1c] received new view: 
> View[172.17.0.2(test-cluster-locator-0:62:locator)<ec><v0>:1024|1] members: 
> [172.17.0.2(test-cluster-locator-0:62:locator)<ec><v0>:1024, 
> 172.17.0.4(test-cluster-server-1:35)<v1>:1024{lead}, 
> 172.17.0.3(test-cluster-server-0:34)<v1>:1024, 
> 172.17.0.5(test-cluster-server-2:34)<v1>:1024]
>   old view is: null
> [info 2016/10/11 21:40:04.771 UTC test-cluster-server-0 <main> tid=0x1] 
> Finished joining (took 2006ms).
> [info 2016/10/11 21:40:04.773 UTC test-cluster-server-0 <main> tid=0x1] 
> Starting DistributionManager 172.17.0.3(test-cluster-server-0:34)<v1>:1024.  
> (took 3025 ms)
> [info 2016/10/11 21:40:04.782 UTC test-cluster-server-0 <main> tid=0x1] 
> Initial (distribution manager) view =  
> View[172.17.0.2(test-cluster-locator-0:62:locator)<ec><v0>:1024|1] members: 
> [172.17.0.2(test-cluster-locator-0:62:locator)<ec><v0>:1024, 
> 172.17.0.4(test-cluster-server-1:35)<v1>:1024{lead}, 
> 172.17.0.3(test-cluster-server-0:34)<v1>:1024, 
> 172.17.0.5(test-cluster-server-2:34)<v1>:1024]
> [info 2016/10/11 21:40:04.782 UTC test-cluster-server-0 <main> tid=0x1] 
> Admitting member 
> <172.17.0.2(test-cluster-locator-0:62:locator)<ec><v0>:1024>. Now there are 1 
> non-admin member(s).
> [info 2016/10/11 21:40:04.785 UTC test-cluster-server-0 <main> tid=0x1] 
> Admitting member <172.17.0.4(test-cluster-server-1:35)<v1>:1024>. Now there 
> are 2 non-admin member(s).
> [info 2016/10/11 21:40:04.785 UTC test-cluster-server-0 <main> tid=0x1] 
> Admitting member <172.17.0.3(test-cluster-server-0:34)<v1>:1024>. Now there 
> are 3 non-admin member(s).
> [info 2016/10/11 21:40:04.786 UTC test-cluster-server-0 <main> tid=0x1] 
> Admitting member <172.17.0.5(test-cluster-server-2:34)<v1>:1024>. Now there 
> are 4 non-admin member(s).
> [info 2016/10/11 21:40:04.980 UTC test-cluster-server-0 <Pooled High Priority 
> Message Processor 2> tid=0x2a] Member 
> 172.17.0.5(test-cluster-server-2:34)<v1>:1024 is not equivalent or in the 
> same redundancy zone.
> [info 2016/10/11 21:40:04.980 UTC test-cluster-server-0 <Pooled High Priority 
> Message Processor 1> tid=0x2b] Member 
> 172.17.0.4(test-cluster-server-1:35)<v1>:1024 is not equivalent or in the 
> same redundancy zone.
> [info 2016/10/11 21:40:04.998 UTC test-cluster-server-0 <P2P message reader 
> for 172.17.0.4(test-cluster-server-1:35)<v1>:1024 shared unordered uid=1 
> port=38682> tid=0x25] Member 172.17.0.4(test-cluster-server-1:35)<v1>:1024 is 
> not equivalent or in the same redundancy zone.
> [info 2016/10/11 21:40:05.020 UTC test-cluster-server-0 <P2P message reader 
> for 172.17.0.5(test-cluster-server-2:34)<v1>:1024 shared unordered uid=1 
> port=59864> tid=0x27] Member 172.17.0.5(test-cluster-server-2:34)<v1>:1024 is 
> not equivalent or in the same redundancy zone.
> [info 2016/10/11 21:40:05.070 UTC test-cluster-server-0 <P2P message reader 
> for 172.17.0.2(test-cluster-locator-0:62:locator)<ec><v0>:1024 shared 
> unordered uid=2 port=36482> tid=0x2e] Member 
> 172.17.0.2(test-cluster-locator-0:62:locator)<ec><v0>:1024 is not equivalent 
> or in the same redundancy zone.
> [error 2016/10/11 21:40:05.545 UTC test-cluster-server-0 <main> tid=0x1] 
> org.apache.geode.GemFireConfigException: cluster configuration service not 
> available
> [info 2016/10/11 21:40:05.572 UTC test-cluster-server-0 <Distributed system 
> shutdown hook> tid=0xf] VM is exiting - shutting down distributed system
> {noformat}
> The locator log shows the following:
> {noformat}
> [info 2016/10/11 21:40:06.035 UTC test-cluster-locator-0 <Geode Membership 
> View Creator> tid=0x27] Peer locator received new membership view: 
> View[172.17.0.2(test-cluster-
> locator-0:62:locator)<ec><v0>:1024|2] members: 
> [172.17.0.2(test-cluster-locator-0:62:locator)<ec><v0>:1024]  shutdown: 
> [172.17.0.3(test-cluster-server-0:34)<v1>:1024, 172.
> 17.0.5(test-cluster-server-2:34)<v1>:1024, 
> 172.17.0.4(test-cluster-server-1:35)<v1>:1024]
> [info 2016/10/11 21:40:06.036 UTC test-cluster-locator-0 <Geode Membership 
> View Creator> tid=0x27] no recipients for new view aside from myself
> [info 2016/10/11 21:40:07.130 UTC test-cluster-locator-0 <main> tid=0x1] 
> SharedConfigStatusRequestHandler installed
> [info 2016/10/11 21:40:07.131 UTC test-cluster-locator-0 <main> tid=0x1] 
> Locator started on  172.17.0.2[10334]
> [info 2016/10/11 21:40:07.131 UTC test-cluster-locator-0 <main> tid=0x1] 
> Starting server location for Distribution Locator on 
> test-cluster-locator-0[10334]
> [info 2016/10/11 21:40:07.200 UTC test-cluster-locator-0 <Pooled Message 
> Processor 1> tid=0x48] Created disk store cluster_config with unique id 
> 7c8ab153c6ce48de-a9006bf9a
> 93a1069
> [info 2016/10/11 21:40:07.220 UTC test-cluster-locator-0 <Pooled Message 
> Processor 1> tid=0x48] recovery region initialization took 3 ms
> [info 2016/10/11 21:40:07.249 UTC test-cluster-locator-0 <Pooled Message 
> Processor 1> tid=0x48] Created oplog#1 drf for disk store cluster_config.
> [info 2016/10/11 21:40:07.252 UTC test-cluster-locator-0 <Pooled Message 
> Processor 1> tid=0x48] Created oplog#1 crf for disk store cluster_config.
> [info 2016/10/11 21:40:07.257 UTC test-cluster-locator-0 <Pooled Message 
> Processor 1> tid=0x48] Initializing region _ConfigurationRegion
> [info 2016/10/11 21:40:07.259 UTC test-cluster-locator-0 <Pooled Message 
> Processor 1> tid=0x48] Region /_ConfigurationRegion was created on this 
> member with the persistent
>  id 
> /172.17.0.2:/tmp/test-cluster-locator-0/ConfigDiskDir_test-cluster-locator-0 
> created at timestamp 1476222007256 version 0 diskStoreId 
> 7c8ab153c6ce48de-a9006bf9a93a1069
>  name test-cluster-locator-0.
> [info 2016/10/11 21:40:07.259 UTC test-cluster-locator-0 <Pooled Message 
> Processor 1> tid=0x48] Initialization of region _ConfigurationRegion completed
> [info 2016/10/11 21:40:07.441 UTC test-cluster-locator-0 <Pooled Message 
> Processor 1> tid=0x48] Completed writing the shared configuration to 
> 'cluster_config' directory
> [info 2016/10/11 21:40:07.442 UTC test-cluster-locator-0 <Pooled Message 
> Processor 1> tid=0x48] Getting Jar files from other locators
> [info 2016/10/11 21:40:07.442 UTC test-cluster-locator-0 <Pooled Message 
> Processor 1> tid=0x48] No other locators present
> [info 2016/10/11 21:40:07.445 UTC test-cluster-locator-0 <Pooled Message 
> Processor 1> tid=0x48] ConfigRequestHandler installed
> [info 2016/10/11 21:40:07.445 UTC test-cluster-locator-0 <Pooled Message 
> Processor 1> tid=0x48] Cluster configuration service start up completed 
> successfully and is now running ....
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to