[
https://issues.apache.org/jira/browse/GEODE-1992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Barbara Pruijn resolved GEODE-1992.
-----------------------------------
Resolution: Won't Do
> When using locator-wait-time and cluster configuration causes error on server
> start.
> ------------------------------------------------------------------------------------
>
> Key: GEODE-1992
> URL: https://issues.apache.org/jira/browse/GEODE-1992
> Project: Geode
> Issue Type: Bug
> Components: configuration
> Reporter: Jens Deppe
> Priority: Major
>
> I'm trying to speed up a cluster start up by using the locator-wait-time
> property and launching all members asynchronously. Unfortunately there seems
> to be a race when cluster configuration is enabled (which it is by default).
> On the server the logs show the following:
> {noformat}
> [info 2016/10/11 21:40:02.229 UTC test-cluster-server-0 <main> tid=0x1]
> Starting membership services
> [info 2016/10/11 21:40:02.658 UTC test-cluster-server-0 <main> tid=0x1]
> JGroups channel created (took 429ms)
> [info 2016/10/11 21:40:02.727 UTC test-cluster-server-0 <main> tid=0x1]
> GemFire P2P Listener started on null
> [info 2016/10/11 21:40:02.734 UTC test-cluster-server-0 <Geode Failure
> Detection Server thread 0> tid=0x20] Started failure detection server thread
> on /172.17.0.3:51937.
> [info 2016/10/11 21:40:04.374 UTC test-cluster-server-0 <main> tid=0x1]
> Attempting to join the distributed system through coordinator
> 172.17.0.2(test-cluster-locator-0:62:locator)<ec>:1024 using address
> 172.17.0.3(test-cluster-server-0:34):1024
> [info 2016/10/11 21:40:04.749 UTC test-cluster-server-0 <unicast
> receiver,test-cluster-server-0-22057> tid=0x1c] received new view:
> View[172.17.0.2(test-cluster-locator-0:62:locator)<ec><v0>:1024|1] members:
> [172.17.0.2(test-cluster-locator-0:62:locator)<ec><v0>:1024,
> 172.17.0.4(test-cluster-server-1:35)<v1>:1024{lead},
> 172.17.0.3(test-cluster-server-0:34)<v1>:1024,
> 172.17.0.5(test-cluster-server-2:34)<v1>:1024]
> old view is: null
> [info 2016/10/11 21:40:04.771 UTC test-cluster-server-0 <main> tid=0x1]
> Finished joining (took 2006ms).
> [info 2016/10/11 21:40:04.773 UTC test-cluster-server-0 <main> tid=0x1]
> Starting DistributionManager 172.17.0.3(test-cluster-server-0:34)<v1>:1024.
> (took 3025 ms)
> [info 2016/10/11 21:40:04.782 UTC test-cluster-server-0 <main> tid=0x1]
> Initial (distribution manager) view =
> View[172.17.0.2(test-cluster-locator-0:62:locator)<ec><v0>:1024|1] members:
> [172.17.0.2(test-cluster-locator-0:62:locator)<ec><v0>:1024,
> 172.17.0.4(test-cluster-server-1:35)<v1>:1024{lead},
> 172.17.0.3(test-cluster-server-0:34)<v1>:1024,
> 172.17.0.5(test-cluster-server-2:34)<v1>:1024]
> [info 2016/10/11 21:40:04.782 UTC test-cluster-server-0 <main> tid=0x1]
> Admitting member
> <172.17.0.2(test-cluster-locator-0:62:locator)<ec><v0>:1024>. Now there are 1
> non-admin member(s).
> [info 2016/10/11 21:40:04.785 UTC test-cluster-server-0 <main> tid=0x1]
> Admitting member <172.17.0.4(test-cluster-server-1:35)<v1>:1024>. Now there
> are 2 non-admin member(s).
> [info 2016/10/11 21:40:04.785 UTC test-cluster-server-0 <main> tid=0x1]
> Admitting member <172.17.0.3(test-cluster-server-0:34)<v1>:1024>. Now there
> are 3 non-admin member(s).
> [info 2016/10/11 21:40:04.786 UTC test-cluster-server-0 <main> tid=0x1]
> Admitting member <172.17.0.5(test-cluster-server-2:34)<v1>:1024>. Now there
> are 4 non-admin member(s).
> [info 2016/10/11 21:40:04.980 UTC test-cluster-server-0 <Pooled High Priority
> Message Processor 2> tid=0x2a] Member
> 172.17.0.5(test-cluster-server-2:34)<v1>:1024 is not equivalent or in the
> same redundancy zone.
> [info 2016/10/11 21:40:04.980 UTC test-cluster-server-0 <Pooled High Priority
> Message Processor 1> tid=0x2b] Member
> 172.17.0.4(test-cluster-server-1:35)<v1>:1024 is not equivalent or in the
> same redundancy zone.
> [info 2016/10/11 21:40:04.998 UTC test-cluster-server-0 <P2P message reader
> for 172.17.0.4(test-cluster-server-1:35)<v1>:1024 shared unordered uid=1
> port=38682> tid=0x25] Member 172.17.0.4(test-cluster-server-1:35)<v1>:1024 is
> not equivalent or in the same redundancy zone.
> [info 2016/10/11 21:40:05.020 UTC test-cluster-server-0 <P2P message reader
> for 172.17.0.5(test-cluster-server-2:34)<v1>:1024 shared unordered uid=1
> port=59864> tid=0x27] Member 172.17.0.5(test-cluster-server-2:34)<v1>:1024 is
> not equivalent or in the same redundancy zone.
> [info 2016/10/11 21:40:05.070 UTC test-cluster-server-0 <P2P message reader
> for 172.17.0.2(test-cluster-locator-0:62:locator)<ec><v0>:1024 shared
> unordered uid=2 port=36482> tid=0x2e] Member
> 172.17.0.2(test-cluster-locator-0:62:locator)<ec><v0>:1024 is not equivalent
> or in the same redundancy zone.
> [error 2016/10/11 21:40:05.545 UTC test-cluster-server-0 <main> tid=0x1]
> org.apache.geode.GemFireConfigException: cluster configuration service not
> available
> [info 2016/10/11 21:40:05.572 UTC test-cluster-server-0 <Distributed system
> shutdown hook> tid=0xf] VM is exiting - shutting down distributed system
> {noformat}
> The locator log shows the following:
> {noformat}
> [info 2016/10/11 21:40:06.035 UTC test-cluster-locator-0 <Geode Membership
> View Creator> tid=0x27] Peer locator received new membership view:
> View[172.17.0.2(test-cluster-
> locator-0:62:locator)<ec><v0>:1024|2] members:
> [172.17.0.2(test-cluster-locator-0:62:locator)<ec><v0>:1024] shutdown:
> [172.17.0.3(test-cluster-server-0:34)<v1>:1024, 172.
> 17.0.5(test-cluster-server-2:34)<v1>:1024,
> 172.17.0.4(test-cluster-server-1:35)<v1>:1024]
> [info 2016/10/11 21:40:06.036 UTC test-cluster-locator-0 <Geode Membership
> View Creator> tid=0x27] no recipients for new view aside from myself
> [info 2016/10/11 21:40:07.130 UTC test-cluster-locator-0 <main> tid=0x1]
> SharedConfigStatusRequestHandler installed
> [info 2016/10/11 21:40:07.131 UTC test-cluster-locator-0 <main> tid=0x1]
> Locator started on 172.17.0.2[10334]
> [info 2016/10/11 21:40:07.131 UTC test-cluster-locator-0 <main> tid=0x1]
> Starting server location for Distribution Locator on
> test-cluster-locator-0[10334]
> [info 2016/10/11 21:40:07.200 UTC test-cluster-locator-0 <Pooled Message
> Processor 1> tid=0x48] Created disk store cluster_config with unique id
> 7c8ab153c6ce48de-a9006bf9a
> 93a1069
> [info 2016/10/11 21:40:07.220 UTC test-cluster-locator-0 <Pooled Message
> Processor 1> tid=0x48] recovery region initialization took 3 ms
> [info 2016/10/11 21:40:07.249 UTC test-cluster-locator-0 <Pooled Message
> Processor 1> tid=0x48] Created oplog#1 drf for disk store cluster_config.
> [info 2016/10/11 21:40:07.252 UTC test-cluster-locator-0 <Pooled Message
> Processor 1> tid=0x48] Created oplog#1 crf for disk store cluster_config.
> [info 2016/10/11 21:40:07.257 UTC test-cluster-locator-0 <Pooled Message
> Processor 1> tid=0x48] Initializing region _ConfigurationRegion
> [info 2016/10/11 21:40:07.259 UTC test-cluster-locator-0 <Pooled Message
> Processor 1> tid=0x48] Region /_ConfigurationRegion was created on this
> member with the persistent
> id
> /172.17.0.2:/tmp/test-cluster-locator-0/ConfigDiskDir_test-cluster-locator-0
> created at timestamp 1476222007256 version 0 diskStoreId
> 7c8ab153c6ce48de-a9006bf9a93a1069
> name test-cluster-locator-0.
> [info 2016/10/11 21:40:07.259 UTC test-cluster-locator-0 <Pooled Message
> Processor 1> tid=0x48] Initialization of region _ConfigurationRegion completed
> [info 2016/10/11 21:40:07.441 UTC test-cluster-locator-0 <Pooled Message
> Processor 1> tid=0x48] Completed writing the shared configuration to
> 'cluster_config' directory
> [info 2016/10/11 21:40:07.442 UTC test-cluster-locator-0 <Pooled Message
> Processor 1> tid=0x48] Getting Jar files from other locators
> [info 2016/10/11 21:40:07.442 UTC test-cluster-locator-0 <Pooled Message
> Processor 1> tid=0x48] No other locators present
> [info 2016/10/11 21:40:07.445 UTC test-cluster-locator-0 <Pooled Message
> Processor 1> tid=0x48] ConfigRequestHandler installed
> [info 2016/10/11 21:40:07.445 UTC test-cluster-locator-0 <Pooled Message
> Processor 1> tid=0x48] Cluster configuration service start up completed
> successfully and is now running ....
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)