[
https://issues.apache.org/jira/browse/SOLR-6402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jessica Cheng updated SOLR-6402:
--------------------------------
Description:
We saw an occurrence where we had some ZK connection blip and the
OverseerCollectionProcessor thread stopped but the ClusterStateUpdater output
some error but kept running, and the node didn't lose its leadership. this
caused our collection work queue to back up.
Right now OverseerCollectionProcessor's run method has on trunk:
{quote}
344 if (e.code() == KeeperException.Code.SESSIONEXPIRED
345 || e.code() == KeeperException.Code.CONNECTIONLOSS) \{
346 log.warn("Overseer cannot talk to ZK");
347 return;
348 \}
{quote}
I think this if statement should only be for SESSIONEXPIRED. If it just
experiences a connection loss but then reconnect before the session expired,
it'll keep being the leader.
was:
We saw an occurrence where we had some ZK connection blip and the
OverseerCollectionProcessor thread stopped but the ClusterStateUpdater output
some error but kept running, and the node didn't lose its leadership. this
caused our collection work queue to back up.
Right now OverseerCollectionProcessor's run method has on trunk:
{quote}
344 if (e.code() == KeeperException.Code.SESSIONEXPIRED
345 || e.code() == KeeperException.Code.CONNECTIONLOSS) {
346 log.warn("Overseer cannot talk to ZK");
347 return;
348 }
{quote}
I think this if statement should only be for SESSIONEXPIRED. If it just
experiences a connection loss but then reconnect before the session expired,
it'll keep being the leader.
> OverseerCollectionProcessor should not exit for ZK ConnectionLoss
> -----------------------------------------------------------------
>
> Key: SOLR-6402
> URL: https://issues.apache.org/jira/browse/SOLR-6402
> Project: Solr
> Issue Type: Bug
> Components: SolrCloud
> Affects Versions: 4.8, 5.0
> Reporter: Jessica Cheng
>
> We saw an occurrence where we had some ZK connection blip and the
> OverseerCollectionProcessor thread stopped but the ClusterStateUpdater output
> some error but kept running, and the node didn't lose its leadership. this
> caused our collection work queue to back up.
> Right now OverseerCollectionProcessor's run method has on trunk:
> {quote}
> 344 if (e.code() == KeeperException.Code.SESSIONEXPIRED
> 345 || e.code() == KeeperException.Code.CONNECTIONLOSS) \{
> 346 log.warn("Overseer cannot talk to ZK");
> 347 return;
> 348 \}
> {quote}
> I think this if statement should only be for SESSIONEXPIRED. If it just
> experiences a connection loss but then reconnect before the session expired,
> it'll keep being the leader.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]