[ 
https://issues.apache.org/jira/browse/SOLR-5860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13943128#comment-13943128
 ] 

Shalin Shekhar Mangar commented on SOLR-5860:
---------------------------------------------

Okay, the failures were unrelated. I just got two clean test passes. I'll 
commit this shortly.

> Logging around core wait for state during startup / recovery is confusing
> -------------------------------------------------------------------------
>
>                 Key: SOLR-5860
>                 URL: https://issues.apache.org/jira/browse/SOLR-5860
>             Project: Solr
>          Issue Type: Improvement
>          Components: SolrCloud
>            Reporter: Timothy Potter
>            Assignee: Shalin Shekhar Mangar
>            Priority: Minor
>         Attachments: SOLR-5860.patch
>
>
> I'm seeing some log messages like this:
> I was asked to wait on state recovering for HOST:8984_solr but I still do not 
> see the requested state. I see state: recovering live:true
> This is very confusing because from the log, it seems like it's waiting to 
> see the state it's in ... After digging through the code, it appears that it 
> is really waiting for a leader to become active so that it has a leader to 
> recover from.
> I'd like to improve the logging around this critical wait loop to give better 
> context to what is happening. 
> Also, I would like to change the following so that we force state updates 
> every 15 seconds for the entire wait period.
> -          if (retry == 15 || retry == 60) {
> +          if (retry % 15 == 0) {
> As-is, it's waiting 120 seconds but only forcing the state to update twice, 
> once after 15 seconds and again after 60 … might be good to force updates for 
> the full wait period.
> Lastly, I think it would be good to use the leaderConflictResolveWait setting 
> (from ZkController) here as well since 120 may not be enough for a leader to 
> become active in a busy cluster, esp. after the node the Overseer is running 
> on. Maybe leaderConflictResolveWait + 5 seconds?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to