[ 
https://issues.apache.org/jira/browse/SOLR-5577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13855910#comment-13855910
 ] 

Mark Miller commented on SOLR-5577:
-----------------------------------

bq.  If a shard updates its state, wouldn't the Overseer have to process the 
state update? 

Correct. And there is already some window as well - I'm not sure it matters 
that the window is a little larger due to the the possibly very small change in 
probability of it being an issue. We have to think about it carefully, but for 
the most part, this is just a preventative measure to make sure some node is 
not going rogue for a long period of time with a cached, stale cluster state 
and no connection to ZooKeeper for some reason, but perhaps a connection to 
other nodes.

I think we always intended to think about ways to relax it, but when putting 
things together initially, it was faster/easier to just lock this down as much 
as possible to start.

> indexing delay due to zookeeper election
> ----------------------------------------
>
>                 Key: SOLR-5577
>                 URL: https://issues.apache.org/jira/browse/SOLR-5577
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Christine Poerschke
>
> The behaviour we observed was that a zookeeper election took about 2s plus 
> 1.5s for reading the zoo_data snapshot. During this time solr tried to 
> establish connections to any zookeeper in the ensemble but only succeeded 
> once a leader was elected *and* that leader was done reading the snapshot. 
> Solr document updates were slowed down during this time window.
> Is this expected to happen during and shortly after elections, that is 
> zookeeper closing existing connections, rejecting new connections and thus 
> stalling solr updates?
> Other than avoiding zookeeper elections, are there ways of reducing their 
> impact on solr?
> +zookeeper log extract+
> {noformat}
> 08:18:54,968 [QuorumCnxManager.java:762] Connection broken for id ...
> 08:18:56,916 [Leader.java:345] LEADING - LEADER ELECTION TOOK - 1941
> 08:18:56,918 [FileSnap.java:83] Reading snapshot ...
> ...
> 08:18:57,010 [NIOServerCnxnFactory.java:197] Accepted socket connection from 
> ...
> 08:18:57,010 [NIOServerCnxn.java:354] Exception causing close of session 0x0 
> due to java.io.IOException: ZooKeeperServer not running
> 08:18:57,010 [NIOServerCnxn.java:1001] Closed socket connection for client 
> ... (no session established for client)
> ...
> 08:18:58,496 [FileTxnSnapLog.java:240] Snapshotting: ... to ...
> {noformat}
> +solr log extract+
> {noformat}
> 08:18:54,968 [ClientCnxn.java:1085] Unable to read additional data from 
> server sessionid ... likely server has closed socket, closing socket 
> connection and attempting reconnect
> 08:18:55,068 [ConnectionManager.java:72] Watcher 
> org.apache.solr.common.cloud.ConnectionManager@... name:ZooKeeperConnection 
> Watcher:host1:port1,host2:port2,host3:port3,... got event WatchedEvent 
> state:Disconnected type:None path:null path:null type:None
> 08:18:55,068 [ConnectionManager.java:132] zkClient has disconnected
> ...
> 08:18:55,961 [ClientCnxn.java:966] Opening socket connection to server ...
> 08:18:55,961 [ClientCnxn.java:849] Socket connection established to ...
> 08:18:55,962 [ClientCnxn.java:1085] Unable to read additional data from 
> server sessionid ... likely server has closed socket, closing socket 
> connection and attempting reconnect
> ...
> 08:18:56,714 [ClientCnxn.java:966] Opening socket connection to server ...
> 08:18:56,715 [ClientCnxn.java:849] Socket connection established to ...
> 08:18:56,715 [ClientCnxn.java:1085] Unable to read additional data from ...
> ...
> 08:18:57,640 [ClientCnxn.java:966] Opening socket connection to server ...
> 08:18:57,641 [ClientCnxn.java:849] Socket connection established to ...
> 08:18:57,641 [ClientCnxn.java:1085] Unable to read additional data from ...
> ...
> 08:18:58,352 [ClientCnxn.java:966] Opening socket connection to server ...
> 08:18:58,353 [ClientCnxn.java:849] Socket connection established to ...
> 08:18:58,353 [ClientCnxn.java:1085] Unable to read additional data from ...
> ...
> 08:18:58,749 [ClientCnxn.java:966] Opening socket connection to server ...
> 08:18:58,749 [ClientCnxn.java:849] Socket connection established to ...
> 08:18:58,751 [ClientCnxn.java:1207] Session establishment complete on server 
> ... sessionid = ..., negotiated timeout = ...
> 08:18:58,751 ... [ConnectionManager.java:72] Watcher 
> org.apache.solr.common.cloud.ConnectionManager@... name:ZooKeeperConnection 
> Watcher:host1:port1,host2:port2,host3:port3,... got event WatchedEvent 
> state:SyncConnected type:None path:null path:null type:None
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to