[ 
https://issues.apache.org/jira/browse/SOLR-7338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14392763#comment-14392763
 ] 

Mark Miller commented on SOLR-7338:
-----------------------------------

bq. "isReloaded" feels more like "isReloading"

What's the gain, what's the point, what's the alternative?

I don't get that at all. It tells you if the core has been reloaded. This is 
often useful in things that happen on creating a new SolrCore.

Who cares about isReloading? I'm lost.

Is it just too difficult to understand what isReloaded means?

I'd be more confused with this temporary isReloading call - seems so easy for 
that to be tricky. isReloaded is so permanent and easy to understand. The core 
has been reloaded or it hasn't. How the heck does trying to track exactly when 
the core is actually in the process of reload or not more useful?

Anyone else?

> A reloaded core will never register itself as active after a ZK session 
> expiration
> ----------------------------------------------------------------------------------
>
>                 Key: SOLR-7338
>                 URL: https://issues.apache.org/jira/browse/SOLR-7338
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>            Reporter: Timothy Potter
>            Assignee: Mark Miller
>         Attachments: SOLR-7338.patch, SOLR-7338_test.patch
>
>
> If a collection gets reloaded, then a core's isReloaded flag is always true. 
> If a core experiences a ZK session expiration after a reload, then it won't 
> ever be able to set itself to active because of the check in 
> {{ZkController#register}}:
> {code}
>         UpdateLog ulog = core.getUpdateHandler().getUpdateLog();
>         if (!core.isReloaded() && ulog != null) {
>           // disable recovery in case shard is in construction state (for 
> shard splits)
>           Slice slice = getClusterState().getSlice(collection, shardId);
>           if (slice.getState() != Slice.State.CONSTRUCTION || !isLeader) {
>             Future<UpdateLog.RecoveryInfo> recoveryFuture = 
> core.getUpdateHandler().getUpdateLog().recoverFromLog();
>             if (recoveryFuture != null) {
>               log.info("Replaying tlog for " + ourUrl + " during startup... 
> NOTE: This can take a while.");
>               recoveryFuture.get(); // NOTE: this could potentially block for
>               // minutes or more!
>               // TODO: public as recovering in the mean time?
>               // TODO: in the future we could do peersync in parallel with 
> recoverFromLog
>             } else {
>               log.info("No LogReplay needed for core=" + core.getName() + " 
> baseURL=" + baseUrl);
>             }
>           }
>           boolean didRecovery = checkRecovery(coreName, desc, 
> recoverReloadedCores, isLeader, cloudDesc,
>               collection, coreZkNodeName, shardId, leaderProps, core, cc);
>           if (!didRecovery) {
>             publish(desc, ZkStateReader.ACTIVE);
>           }
>         }
> {code}
> I can easily simulate this on trunk by doing:
> {code}
> bin/solr -c -z localhost:2181
> bin/solr create -c foo
> bin/post -c foo example/exampledocs/*.xml
> curl "http://localhost:8983/solr/admin/collections?action=RELOAD&name=foo";
> kill -STOP <PID> && sleep <PAUSE_SECONDS> && kill -CONT <PID>
> {code}
> Where <PID> is the process ID of the Solr node. Here are the logs after the 
> CONT command. As you can see below, the core never gets to setting itself as 
> active again. I think the bug is that the isReloaded flag needs to get set 
> back to false once the reload is successful, but I don't understand what this 
> flag is needed for anyway???
> {code}
> INFO  - 2015-04-01 17:28:50.962; 
> org.apache.solr.common.cloud.ConnectionManager; Watcher 
> org.apache.solr.common.cloud.ConnectionManager@5519dba0 
> name:ZooKeeperConnection Watcher:localhost:2181 got event WatchedEvent 
> state:Disconnected type:None path:null path:null type:None
> INFO  - 2015-04-01 17:28:50.963; 
> org.apache.solr.common.cloud.ConnectionManager; zkClient has disconnected
> INFO  - 2015-04-01 17:28:51.107; 
> org.apache.solr.common.cloud.ConnectionManager; Watcher 
> org.apache.solr.common.cloud.ConnectionManager@5519dba0 
> name:ZooKeeperConnection Watcher:localhost:2181 got event WatchedEvent 
> state:Expired type:None path:null path:null type:None
> INFO  - 2015-04-01 17:28:51.107; 
> org.apache.solr.common.cloud.ConnectionManager; Our previous ZooKeeper 
> session was expired. Attempting to reconnect to recover relationship with 
> ZooKeeper...
> INFO  - 2015-04-01 17:28:51.108; org.apache.solr.cloud.Overseer; Overseer 
> (id=93579450724974592-192.168.1.2:8983_solr-n_0000000000) closing
> INFO  - 2015-04-01 17:28:51.108; 
> org.apache.solr.cloud.ZkController$WatcherImpl; A node got unwatched for 
> /configs/foo
> INFO  - 2015-04-01 17:28:51.108; 
> org.apache.solr.cloud.Overseer$ClusterStateUpdater; Overseer Loop exiting : 
> 192.168.1.2:8983_solr
> INFO  - 2015-04-01 17:28:51.109; 
> org.apache.solr.cloud.OverseerCollectionProcessor; According to ZK I 
> (id=93579450724974592-192.168.1.2:8983_solr-n_0000000000) am no longer a 
> leader.
> INFO  - 2015-04-01 17:28:51.108; org.apache.solr.cloud.ZkController$4; 
> Running listeners for /configs/foo
> INFO  - 2015-04-01 17:28:51.109; 
> org.apache.solr.common.cloud.DefaultConnectionStrategy; Connection expired - 
> starting a new one...
> INFO  - 2015-04-01 17:28:51.109; org.apache.solr.core.SolrCore$11; config 
> update listener called for core foo_shard1_replica1
> INFO  - 2015-04-01 17:28:51.110; 
> org.apache.solr.common.cloud.ConnectionManager; Waiting for client to connect 
> to ZooKeeper
> ERROR - 2015-04-01 17:28:51.110; org.apache.solr.common.SolrException; 
> OverseerAutoReplicaFailoverThread had an error in its thread work 
> loop.:org.apache.solr.common.SolrException: Error reading cluster properties
>       at 
> org.apache.solr.common.cloud.ZkStateReader.getClusterProps(ZkStateReader.java:772)
>       at 
> org.apache.solr.cloud.OverseerAutoReplicaFailoverThread.doWork(OverseerAutoReplicaFailoverThread.java:150)
>       at 
> org.apache.solr.cloud.OverseerAutoReplicaFailoverThread.run(OverseerAutoReplicaFailoverThread.java:129)
>       at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.InterruptedException: sleep interrupted
>       at java.lang.Thread.sleep(Native Method)
>       at 
> org.apache.solr.common.cloud.ZkCmdExecutor.retryDelay(ZkCmdExecutor.java:108)
>       at 
> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:76)
>       at 
> org.apache.solr.common.cloud.SolrZkClient.exists(SolrZkClient.java:308)
>       at 
> org.apache.solr.common.cloud.ZkStateReader.getClusterProps(ZkStateReader.java:765)
>       ... 3 more
> WARN  - 2015-04-01 17:28:51.110; org.apache.solr.cloud.ZkController$4; 
> listener throws error
> org.apache.solr.common.SolrException: 
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode 
> = Session expired for /configs/foo/params.json
>       at 
> org.apache.solr.core.RequestParams.getFreshRequestParams(RequestParams.java:160)
>       at 
> org.apache.solr.core.SolrConfig.refreshRequestParams(SolrConfig.java:907)
>       at org.apache.solr.core.SolrCore$11.run(SolrCore.java:2503)
>       at org.apache.solr.cloud.ZkController$4.run(ZkController.java:2351)
> Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException: 
> KeeperErrorCode = Session expired for /configs/foo/params.json
>       at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
>       at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>       at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045)
>       at 
> org.apache.solr.common.cloud.SolrZkClient$4.execute(SolrZkClient.java:294)
>       at 
> org.apache.solr.common.cloud.SolrZkClient$4.execute(SolrZkClient.java:291)
>       at 
> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:61)
>       at 
> org.apache.solr.common.cloud.SolrZkClient.exists(SolrZkClient.java:291)
>       at 
> org.apache.solr.core.RequestParams.getFreshRequestParams(RequestParams.java:150)
>       ... 3 more
> ERROR - 2015-04-01 17:28:51.110; 
> org.apache.solr.cloud.Overseer$ClusterStateUpdater; could not read the data
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode 
> = Session expired for /overseer_elect/leader
>       at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
>       at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>       at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
>       at 
> org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:345)
>       at 
> org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:342)
>       at 
> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:61)
>       at 
> org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:342)
>       at 
> org.apache.solr.cloud.Overseer$ClusterStateUpdater.checkIfIamStillLeader(Overseer.java:298)
>       at 
> org.apache.solr.cloud.Overseer$ClusterStateUpdater.access$300(Overseer.java:83)
>       at 
> org.apache.solr.cloud.Overseer$ClusterStateUpdater$2.run(Overseer.java:259)
> INFO  - 2015-04-01 17:28:51.114; 
> org.apache.solr.common.cloud.ConnectionManager; Watcher 
> org.apache.solr.common.cloud.ConnectionManager@5519dba0 
> name:ZooKeeperConnection Watcher:localhost:2181 got event WatchedEvent 
> state:SyncConnected type:None path:null path:null type:None
> INFO  - 2015-04-01 17:28:51.115; 
> org.apache.solr.common.cloud.ConnectionManager; Client is connected to 
> ZooKeeper
> INFO  - 2015-04-01 17:28:51.115; 
> org.apache.solr.common.cloud.ConnectionManager$1; Connection with ZooKeeper 
> reestablished.
> INFO  - 2015-04-01 17:28:51.115; org.apache.solr.cloud.ZkController$1; 
> ZooKeeper session re-connected ... refreshing core states after session 
> expiration.
> INFO  - 2015-04-01 17:28:51.115; org.apache.solr.cloud.ZkController; 
> publishing core=foo_shard1_replica1 state=down collection=foo
> INFO  - 2015-04-01 17:28:51.118; org.apache.solr.cloud.ElectionContext; 
> canceling election 
> /overseer_elect/election/93579450724974592-192.168.1.2:8983_solr-n_0000000000
> WARN  - 2015-04-01 17:28:51.119; org.apache.solr.cloud.ElectionContext; 
> cancelElection did not find election node to remove 
> /overseer_elect/election/93579450724974592-192.168.1.2:8983_solr-n_0000000000
> INFO  - 2015-04-01 17:28:51.121; 
> org.apache.solr.cloud.OverseerElectionContext; I am going to be the leader 
> 192.168.1.2:8983_solr
> INFO  - 2015-04-01 17:28:51.121; org.apache.solr.common.cloud.SolrZkClient; 
> makePath: /overseer_elect/leader
> INFO  - 2015-04-01 17:28:51.122; org.apache.solr.cloud.Overseer; Overseer 
> (id=93579450724974594-192.168.1.2:8983_solr-n_0000000001) starting
> INFO  - 2015-04-01 17:28:51.128; 
> org.apache.solr.cloud.OverseerAutoReplicaFailoverThread; Starting 
> OverseerAutoReplicaFailoverThread autoReplicaFailoverWorkLoopDelay=10000 
> autoReplicaFailoverWaitAfterExpiration=30000 
> autoReplicaFailoverBadNodeExpiration=60000
> INFO  - 2015-04-01 17:28:51.128; 
> org.apache.solr.cloud.OverseerCollectionProcessor; Process current queue of 
> collection creations
> INFO  - 2015-04-01 17:28:51.128; org.apache.solr.common.cloud.ZkStateReader; 
> Updating cluster state from ZooKeeper... 
> INFO  - 2015-04-01 17:28:51.129; 
> org.apache.solr.cloud.Overseer$ClusterStateUpdater; Starting to work on the 
> main queue
> INFO  - 2015-04-01 17:28:51.130; org.apache.solr.common.cloud.ZkStateReader; 
> addZkWatch foo
> INFO  - 2015-04-01 17:28:51.131; org.apache.solr.common.cloud.ZkStateReader; 
> Updating collection state at /collections/foo/state.json from ZooKeeper... 
> INFO  - 2015-04-01 17:28:51.131; org.apache.solr.common.cloud.ZkStateReader; 
> Updating data for foo to ver 4 
> INFO  - 2015-04-01 17:28:51.131; org.apache.solr.cloud.ZkController; Register 
> node as live in ZooKeeper:/live_nodes/192.168.1.2:8983_solr
> INFO  - 2015-04-01 17:28:51.132; org.apache.solr.common.cloud.SolrZkClient; 
> makePath: /live_nodes/192.168.1.2:8983_solr
> INFO  - 2015-04-01 17:28:51.133; org.apache.solr.cloud.ZkController; Register 
> replica - core:foo_shard1_replica1 address:http://192.168.1.2:8983/solr 
> collection:foo shard:shard1
> INFO  - 2015-04-01 17:28:51.133; org.apache.solr.common.cloud.ZkStateReader; 
> Updating data for foo to ver 4 
> INFO  - 2015-04-01 17:28:51.133; org.apache.solr.cloud.ElectionContext; 
> canceling election 
> /collections/foo/leader_elect/shard1/election/93579450724974592-core_node1-n_0000000000
> WARN  - 2015-04-01 17:28:51.134; org.apache.solr.cloud.ElectionContext; 
> cancelElection did not find election node to remove 
> /collections/foo/leader_elect/shard1/election/93579450724974592-core_node1-n_0000000000
> INFO  - 2015-04-01 17:28:51.134; 
> org.apache.solr.cloud.Overseer$ClusterStateUpdater; processMessage: 
> queueSize: 1, message = {
>   "core":"foo_shard1_replica1",
>   "core_node_name":"core_node1",
>   "roles":null,
>   "base_url":"http://192.168.1.2:8983/solr";,
>   "node_name":"192.168.1.2:8983_solr",
>   "numShards":"1",
>   "state":"down",
>   "shard":"shard1",
>   "collection":"foo",
>   "operation":"state"} current state version: 1
> INFO  - 2015-04-01 17:28:51.135; 
> org.apache.solr.cloud.overseer.ReplicaMutator; Update state numShards=1 
> message={
>   "core":"foo_shard1_replica1",
>   "core_node_name":"core_node1",
>   "roles":null,
>   "base_url":"http://192.168.1.2:8983/solr";,
>   "node_name":"192.168.1.2:8983_solr",
>   "numShards":"1",
>   "state":"down",
>   "shard":"shard1",
>   "collection":"foo",
>   "operation":"state"}
> INFO  - 2015-04-01 17:28:51.136; 
> org.apache.solr.cloud.ShardLeaderElectionContext; Running the leader process 
> for shard shard1
> INFO  - 2015-04-01 17:28:51.136; 
> org.apache.solr.cloud.overseer.ZkStateWriter; going to update_collection 
> /collections/foo/state.json version: 4
> INFO  - 2015-04-01 17:28:51.136; org.apache.solr.cloud.ActionThrottle; The 
> last leader attempt started 198060ms ago.
> INFO  - 2015-04-01 17:28:51.137; 
> org.apache.solr.common.cloud.ZkStateReader$7; A cluster state change: 
> WatchedEvent state:SyncConnected type:NodeDataChanged 
> path:/collections/foo/state.json for collection foo has occurred - 
> updating... (live nodes size: 1)
> INFO  - 2015-04-01 17:28:51.137; org.apache.solr.common.cloud.ZkStateReader; 
> Updating data for foo to ver 5 
> INFO  - 2015-04-01 17:28:51.138; 
> org.apache.solr.cloud.DistributedQueue$LatchWatcher; NodeChildrenChanged 
> fired on path /overseer/queue state SyncConnected
> INFO  - 2015-04-01 17:28:51.138; 
> org.apache.solr.cloud.ShardLeaderElectionContext; Enough replicas found to 
> continue.
> INFO  - 2015-04-01 17:28:51.138; 
> org.apache.solr.cloud.ShardLeaderElectionContext; I may be the new leader - 
> try and sync
> INFO  - 2015-04-01 17:28:51.138; org.apache.solr.cloud.SyncStrategy; Sync 
> replicas to http://192.168.1.2:8983/solr/foo_shard1_replica1/
> INFO  - 2015-04-01 17:28:51.138; org.apache.solr.cloud.SyncStrategy; Sync 
> Success - now sync replicas to me
> INFO  - 2015-04-01 17:28:51.138; org.apache.solr.cloud.SyncStrategy; 
> http://192.168.1.2:8983/solr/foo_shard1_replica1/ has no replicas
> INFO  - 2015-04-01 17:28:51.139; 
> org.apache.solr.cloud.ShardLeaderElectionContext; I am the new leader: 
> http://192.168.1.2:8983/solr/foo_shard1_replica1/ shard1
> INFO  - 2015-04-01 17:28:51.139; org.apache.solr.common.cloud.SolrZkClient; 
> makePath: /collections/foo/leaders/shard1
> INFO  - 2015-04-01 17:28:51.139; 
> org.apache.solr.cloud.Overseer$ClusterStateUpdater; processMessage: 
> queueSize: 1, message = {
>   "operation":"leader",
>   "shard":"shard1",
>   "collection":"foo"} current state version: 1
> INFO  - 2015-04-01 17:28:51.140; 
> org.apache.solr.cloud.overseer.ZkStateWriter; going to update_collection 
> /collections/foo/state.json version: 5
> INFO  - 2015-04-01 17:28:51.141; 
> org.apache.solr.common.cloud.ZkStateReader$7; A cluster state change: 
> WatchedEvent state:SyncConnected type:NodeDataChanged 
> path:/collections/foo/state.json for collection foo has occurred - 
> updating... (live nodes size: 1)
> INFO  - 2015-04-01 17:28:51.142; 
> org.apache.solr.cloud.DistributedQueue$LatchWatcher; NodeChildrenChanged 
> fired on path /overseer/queue state SyncConnected
> INFO  - 2015-04-01 17:28:51.142; org.apache.solr.common.cloud.ZkStateReader; 
> Updating data for foo to ver 6 
> INFO  - 2015-04-01 17:28:51.144; 
> org.apache.solr.cloud.Overseer$ClusterStateUpdater; processMessage: 
> queueSize: 1, message = {
>   "operation":"leader",
>   "shard":"shard1",
>   "collection":"foo",
>   "base_url":"http://192.168.1.2:8983/solr";,
>   "core":"foo_shard1_replica1",
>   "state":"active"} current state version: 1
> INFO  - 2015-04-01 17:28:51.145; 
> org.apache.solr.cloud.overseer.ZkStateWriter; going to update_collection 
> /collections/foo/state.json version: 6
> INFO  - 2015-04-01 17:28:51.145; 
> org.apache.solr.common.cloud.ZkStateReader$7; A cluster state change: 
> WatchedEvent state:SyncConnected type:NodeDataChanged 
> path:/collections/foo/state.json for collection foo has occurred - 
> updating... (live nodes size: 1)
> INFO  - 2015-04-01 17:28:51.146; org.apache.solr.common.cloud.ZkStateReader; 
> Updating data for foo to ver 7 
> INFO  - 2015-04-01 17:28:51.147; 
> org.apache.solr.cloud.DistributedQueue$LatchWatcher; NodeChildrenChanged 
> fired on path /overseer/queue state SyncConnected
> INFO  - 2015-04-01 17:28:51.196; org.apache.solr.cloud.ZkController; We are 
> http://192.168.1.2:8983/solr/foo_shard1_replica1/ and leader is 
> http://192.168.1.2:8983/solr/foo_shard1_replica1/
> INFO  - 2015-04-01 17:28:51.196; org.apache.solr.cloud.ZkController; 
>  In register, core.isReloaded? true 
> INFO  - 2015-04-01 17:28:51.199; org.apache.solr.common.cloud.ZkStateReader; 
> Updating data for foo to ver 7 
> INFO  - 2015-04-01 17:28:51.199; org.apache.solr.cloud.ZkController; watch 
> zkdir /configs/foo
> INFO  - 2015-04-01 17:28:51.199; org.apache.solr.schema.ZkIndexSchemaReader; 
> Creating ZooKeeper watch for the managed schema at /configs/foo/managed-schema
> INFO  - 2015-04-01 17:28:51.199; org.apache.solr.cloud.ZkController$4; 
> Running listeners for /configs/foo
> INFO  - 2015-04-01 17:28:51.200; org.apache.solr.core.SolrCore$11; config 
> update listener called for core foo_shard1_replica1
> INFO  - 2015-04-01 17:28:51.200; org.apache.solr.schema.ZkIndexSchemaReader; 
> Current schema version 5 is already the latest
> INFO  - 2015-04-01 17:28:51.200; org.apache.solr.schema.ZkIndexSchemaReader; 
> Creating ZooKeeper watch for the managed schema at /configs/foo/managed-schema
> INFO  - 2015-04-01 17:28:51.200; org.apache.solr.core.SolrConfig; current 
> version of requestparams : 0
> INFO  - 2015-04-01 17:28:51.201; org.apache.solr.schema.ZkIndexSchemaReader; 
> Current schema version 5 is already the latest
> INFO  - 2015-04-01 17:28:51.201; 
> org.apache.solr.common.cloud.DefaultConnectionStrategy; Reconnected to 
> ZooKeeper
> INFO  - 2015-04-01 17:28:51.202; 
> org.apache.solr.common.cloud.ConnectionManager; Connected:true
> INFO  - 2015-04-01 17:28:51.211; 
> org.apache.solr.cloud.ZkController$WatcherImpl; A node got unwatched for 
> /configs/foo
> INFO  - 2015-04-01 17:28:51.211; org.apache.solr.cloud.ZkController$4; 
> Running listeners for /configs/foo
> INFO  - 2015-04-01 17:28:51.211; org.apache.solr.core.SolrCore$11; config 
> update listener called for core foo_shard1_replica1
> INFO  - 2015-04-01 17:28:51.212; org.apache.solr.core.SolrConfig; current 
> version of requestparams : 0
> INFO  - 2015-04-01 17:29:07.080; org.apache.solr.servlet.SolrDispatchFilter; 
> [admin] webapp=null path=/admin/cores 
> params={indexInfo=false&wt=json&_=1427909347078} status=0 QTime=0 
> INFO  - 2015-04-01 17:29:07.112; org.apache.solr.servlet.SolrDispatchFilter; 
> [admin] webapp=null path=/admin/info/system params={wt=json&_=1427909347099} 
> status=0 QTime=11 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to