[ 
https://issues.apache.org/jira/browse/SOLR-6159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar resolved SOLR-6159.
-----------------------------------------

       Resolution: Fixed
    Fix Version/s: 4.10
                   5.0

Thanks Steven!

> cancelElection fails on uninitialized ElectionContext
> -----------------------------------------------------
>
>                 Key: SOLR-6159
>                 URL: https://issues.apache.org/jira/browse/SOLR-6159
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>    Affects Versions: 4.8.1
>            Reporter: Steven Bower
>            Assignee: Shalin Shekhar Mangar
>            Priority: Critical
>             Fix For: 5.0, 4.10
>
>         Attachments: SOLR-6159.patch, SOLR-6159.patch
>
>
> I had a solr collection that basically was out of memory (no exception, just 
> continuous 80-90 second full GCs). This of course is not a good state, but 
> when in this state ever time you come out of the GC your zookeeper session 
> has expired causing all kinds of havoc. Anyway I found bug in the condition 
> where during LeaderElector.setup() if you get a Zookeeper error the 
> LeaderElector.context get set to a context that is not fully initialized (ie 
> hasn't called joinElection..
> Anyway once this happens the node can no longer attempt to join elections 
> because every time the LeaderElector attempts to call cancelElection() on the 
> previous ElectionContext..
> Some logs below and I've attached a patch that does 2 things:
> * Move the setting of LeaderElector.context in the setup call to then of the 
> call so it is only set if the setup completes.
> * Added a check to see if leaderSeqPath is null in 
> ElectionContext.cancelElection
> * Made leaderSeqPath volatile as it is being directly accessed by multiple 
> threads.
> * set LeaderElector.context = null when joinElection fails
> There may be other issues.. the patch is focused on breaking the failure loop 
> that occurs when initialization of the ElectionContext fails.
> {noformat}
> 2014-06-08 23:14:57.805 INFO  ClientCnxn [main-SendThread(host1:1234)] - 
> Opening socket connection to server host1/10.122.142.31:1234. Will not 
> attempt to authenticate using SASL (unknown error)
> 2014-06-08 23:14:57.806 INFO  ClientCnxn [main-SendThread(host1:1234)] - 
> Socket connection established to host1/10.122.142.31:1234, initiating session
> 2014-06-08 23:14:57.810 INFO  ClientCnxn [main-SendThread(host1:1234)] - 
> Unable to reconnect to ZooKeeper service, session 0x2467d956c8d0446 has 
> expired, closing socket connection
> 2014-06-08 23:14:57.816 INFO  ConnectionManager [main-EventThread] - Watcher 
> org.apache.solr.common.cloud.ConnectionManager@7fe8e0f1 
> name:ZooKeeperConnection 
> Watcher:host4:1234,host1:1234,host3:1234,host2:1234/engines/solr/collections/XXXXXX
>  got event WatchedEvent state:Expired type:None path:null path:null type:None
> 2014-06-08 23:14:57.817 INFO  ConnectionManager [main-EventThread] - Our 
> previous ZooKeeper session was expired. Attempting to reconnect to recover 
> relationship with ZooKeeper...
> 2014-06-08 23:14:57.817 INFO  DefaultConnectionStrategy [main-EventThread] - 
> Connection expired - starting a new one...
> 2014-06-08 23:14:57.817 INFO  ZooKeeper [main-EventThread] - Initiating 
> client connection, 
> connectString=host4:1234,host1:1234,host3:1234,host2:1234/engines/solr/collections/XXXXXX
>  sessionTimeout=15000 
> watcher=org.apache.solr.common.cloud.ConnectionManager@7fe8e0f1
> 2014-06-08 23:14:57.857 INFO  ConnectionManager [main-EventThread] - Waiting 
> for client to connect to ZooKeeper
> 2014-06-08 23:14:57.859 INFO  ClientCnxn [main-SendThread(host4:1234)] - 
> Opening socket connection to server host4/172.17.14.107:1234. Will not 
> attempt to authenticate using SASL (unknown error)
> 2014-06-08 23:14:57.891 INFO  ClientCnxn [main-SendThread(host4:1234)] - 
> Socket connection established to host4/172.17.14.107:1234, initiating session
> 2014-06-08 23:14:57.906 INFO  ClientCnxn [main-SendThread(host4:1234)] - 
> Session establishment complete on server host4/172.17.14.107:1234, sessionid 
> = 0x4467d8d79260486, negotiated timeout = 15000
> 2014-06-08 23:14:57.907 INFO  ConnectionManager [main-EventThread] - Watcher 
> org.apache.solr.common.cloud.ConnectionManager@7fe8e0f1 
> name:ZooKeeperConnection 
> Watcher:host4:1234,host1:1234,host3:1234,host2:1234/engines/solr/collections/XXXXXX
>  got event WatchedEvent state:SyncConnected type:None path:null path:null 
> type:None
> 2014-06-08 23:14:57.909 INFO  ConnectionManager [main-EventThread] - Client 
> is connected to ZooKeeper
> 2014-06-08 23:14:57.909 INFO  ConnectionManager [main-EventThread] - 
> Connection with ZooKeeper reestablished.
> 2014-06-08 23:14:57.911 ERROR ZkController [Thread-203] - 
> :org.apache.zookeeper.KeeperException$SessionExpiredException: 
> KeeperErrorCode = Session expired for /overseer_elect/election
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>   at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045)
>   at 
> org.apache.solr.common.cloud.SolrZkClient$4.execute(SolrZkClient.java:226)
>   at 
> org.apache.solr.common.cloud.SolrZkClient$4.execute(SolrZkClient.java:223)
>   at 
> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:73)
>   at org.apache.solr.common.cloud.SolrZkClient.exists(SolrZkClient.java:223)
>   at 
> org.apache.solr.common.cloud.ZkCmdExecutor.ensureExists(ZkCmdExecutor.java:100)
>   at 
> org.apache.solr.common.cloud.ZkCmdExecutor.ensureExists(ZkCmdExecutor.java:94)
>   at org.apache.solr.cloud.LeaderElector.setup(LeaderElector.java:324)
>   at org.apache.solr.cloud.ZkController$1.command(ZkController.java:235)
>   at 
> org.apache.solr.common.cloud.ConnectionManager$1$1.run(ConnectionManager.java:166)
> 2014-06-08 23:14:57.911 WARN  ConnectionManager [Thread-203] - Exception 
> running onReconnect command
> org.apache.solr.common.cloud.ZooKeeperException:
>   at org.apache.solr.cloud.ZkController$1.command(ZkController.java:267)
>   at 
> org.apache.solr.common.cloud.ConnectionManager$1$1.run(ConnectionManager.java:166)
> Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException: 
> KeeperErrorCode = Session expired for /overseer_elect/election
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>   at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045)
>   at 
> org.apache.solr.common.cloud.SolrZkClient$4.execute(SolrZkClient.java:226)
>   at 
> org.apache.solr.common.cloud.SolrZkClient$4.execute(SolrZkClient.java:223)
>   at 
> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:73)
>   at org.apache.solr.common.cloud.SolrZkClient.exists(SolrZkClient.java:223)
>   at 
> org.apache.solr.common.cloud.ZkCmdExecutor.ensureExists(ZkCmdExecutor.java:100)
>   at 
> org.apache.solr.common.cloud.ZkCmdExecutor.ensureExists(ZkCmdExecutor.java:94)
>   at org.apache.solr.cloud.LeaderElector.setup(LeaderElector.java:324)
>   at org.apache.solr.cloud.ZkController$1.command(ZkController.java:235)
>   ... 1 more
> 2014-06-08 23:14:57.917 INFO  DefaultConnectionStrategy [main-EventThread] - 
> Reconnected to ZooKeeper
> 2014-06-08 23:14:57.917 INFO  ConnectionManager [main-EventThread] - 
> Connected:true
> 2014-06-08 23:14:57.917 INFO  ClientCnxn [main-EventThread] - EventThread 
> shut down
> 2014-06-08 23:14:57.917 INFO  ZkController [Thread-205] - publishing 
> core=XXXXXX state=down collection=XXXXXX
> 2014-06-08 23:14:57.920 INFO  ZkController [Thread-205] - numShards not found 
> on descriptor - reading it from system property
> 2014-06-08 23:14:58.033 INFO  ElectionContext [Thread-205] - canceling 
> election null
> 2014-06-08 23:14:58.083 ERROR ZkController [Thread-205] - 
> :java.lang.IllegalArgumentException: Path cannot be null
>   at org.apache.zookeeper.common.PathUtils.validatePath(PathUtils.java:45)
>   at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:851)
>   at 
> org.apache.solr.common.cloud.SolrZkClient$2.execute(SolrZkClient.java:177)
>   at 
> org.apache.solr.common.cloud.SolrZkClient$2.execute(SolrZkClient.java:174)
>   at 
> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:73)
>   at org.apache.solr.common.cloud.SolrZkClient.delete(SolrZkClient.java:174)
>   at 
> org.apache.solr.cloud.ElectionContext.cancelElection(ElectionContext.java:72)
>   at 
> org.apache.solr.cloud.OverseerElectionContext.cancelElection(ElectionContext.java:447)
>   at org.apache.solr.cloud.ZkController$1.command(ZkController.java:232)
>   at 
> org.apache.solr.common.cloud.ConnectionManager$1$1.run(ConnectionManager.java:166)
> 2014-06-08 23:14:58.083 WARN  ConnectionManager [Thread-205] - Exception 
> running onReconnect command
> org.apache.solr.common.cloud.ZooKeeperException:
>   at org.apache.solr.cloud.ZkController$1.command(ZkController.java:267)
>   at 
> org.apache.solr.common.cloud.ConnectionManager$1$1.run(ConnectionManager.java:166)
> Caused by: java.lang.IllegalArgumentException: Path cannot be null
>   at org.apache.zookeeper.common.PathUtils.validatePath(PathUtils.java:45)
>   at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:851)
>   at 
> org.apache.solr.common.cloud.SolrZkClient$2.execute(SolrZkClient.java:177)
>   at 
> org.apache.solr.common.cloud.SolrZkClient$2.execute(SolrZkClient.java:174)
>   at 
> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:73)
>   at org.apache.solr.common.cloud.SolrZkClient.delete(SolrZkClient.java:174)
>   at 
> org.apache.solr.cloud.ElectionContext.cancelElection(ElectionContext.java:72)
>   at 
> org.apache.solr.cloud.OverseerElectionContext.cancelElection(ElectionContext.java:447)
>   at org.apache.solr.cloud.ZkController$1.command(ZkController.java:232)
>   ... 1 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to