[jira] [Comment Edited] (SOLR-10285) Reduce state messages when there are leader only shards

Cao Manh Dat (JIRA) Sun, 01 Oct 2017 19:38:25 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-10285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16187638#comment-16187638
 ]


Cao Manh Dat edited comment on SOLR-10285 at 10/2/17 2:38 AM:
--------------------------------------------------------------

Hi [~jhump], your patch looks good to me. About your TODO notes, I did some 
search and found that
- ElectionContext is the only place use OverseerAction.Leader ( one for unset 
leader and one for set leader ).
- STATE_PROP used in the second case is replica's state, which even not used in 
{{SliceMutator.setShardLeader}}

So your concern about "mark the shard as inactive" is not correct, right?

The only problem that can occur between upgrade is 
1. A replica ( repA ) is currently leader
2. The overseer is very busy
3. repA does unset leader operation ( which is delayed because overseer is very 
busy )
4. repA get stopped in middle of the election process ( so set leader operation 
never get executed )
5. repA start with the new code, then it saw it is the leader ( the unset 
operation in step 2 had not been executed ) so it skipped set leader operation.

I think that above case is very very very rare and even it happens, Sysadmins 
must handle overwhelming in the number of operations in Overseer first. 




was (Author: caomanhdat):
Hi [~jhump], your patch looks good to me. About your TODO notes, I did some 
search and found that
- ElectionContext is the only place use OverseerAction.Leader ( one for unset 
leader and one for set leader ).
- STATE_PROP used in the second case is replica's state, which even not used in 
{{SliceMutator.setShardLeader}}
So your concern about "mark the shard as inactive" is not correct, right?

The only case that can occur between upgrade is 
1. A replica ( repA ) is currently leader
2. The overseer is very busy
3. repA does unset leader operation ( which is delayed because overseer is very 
busy )
4. repA get stopped in middle of the election process ( so set leader operation 
never get executed )
5. repA start with the new code, then it saw it is the leader ( the unset 
operation in step 2 had not been executed ) so it skipped set leader operation.

I think that above case is very very very rare and even it happens, Sysadmins 
must handle overwhelming in the number of operations in Overseer first. 



> Reduce state messages when there are leader only shards
> -------------------------------------------------------
>
>                 Key: SOLR-10285
>                 URL: https://issues.apache.org/jira/browse/SOLR-10285
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Varun Thacker
>            Assignee: Cao Manh Dat
>         Attachments: SOLR-10285.patch
>
>
> For shards which have 1 replica ( leader ) we know it doesn't need to recover 
> from anyone. We should short-circuit the recovery process in this case. 
> The motivation for this being that we will generate less state events and be 
> able to mark these replicas as active again without it needing to go into 
> 'recovering' state. 
> We already short circuit when you set {{-Dsolrcloud.skip.autorecovery=true}} 
> but that sys prop was meant for tests only. Extending this to make sure the 
> code short-circuits when the core knows its the only replica in the shard is 
> the motivation of the Jira.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (SOLR-10285) Reduce state messages when there are leader only shards

Reply via email to