[
https://issues.apache.org/jira/browse/SOLR-10285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16187638#comment-16187638
]
Cao Manh Dat edited comment on SOLR-10285 at 10/2/17 2:38 AM:
--------------------------------------------------------------
Hi [~jhump], your patch looks good to me. About your TODO notes, I did some
search and found that
- ElectionContext is the only place use OverseerAction.Leader ( one for unset
leader and one for set leader ).
- STATE_PROP used in the second case is replica's state, which even not used in
{{SliceMutator.setShardLeader}}
So your concern about "mark the shard as inactive" is not correct, right?
The only problem that can occur between upgrade is
1. A replica ( repA ) is currently leader
2. The overseer is very busy
3. repA does unset leader operation ( which is delayed because overseer is very
busy )
4. repA get stopped in middle of the election process ( so set leader operation
never get executed )
5. repA start with the new code, then it saw it is the leader ( the unset
operation in step 2 had not been executed ) so it skipped set leader operation.
I think that above case is very very very rare and even it happens, Sysadmins
must handle overwhelming in the number of operations in Overseer first.
was (Author: caomanhdat):
Hi [~jhump], your patch looks good to me. About your TODO notes, I did some
search and found that
- ElectionContext is the only place use OverseerAction.Leader ( one for unset
leader and one for set leader ).
- STATE_PROP used in the second case is replica's state, which even not used in
{{SliceMutator.setShardLeader}}
So your concern about "mark the shard as inactive" is not correct, right?
The only case that can occur between upgrade is
1. A replica ( repA ) is currently leader
2. The overseer is very busy
3. repA does unset leader operation ( which is delayed because overseer is very
busy )
4. repA get stopped in middle of the election process ( so set leader operation
never get executed )
5. repA start with the new code, then it saw it is the leader ( the unset
operation in step 2 had not been executed ) so it skipped set leader operation.
I think that above case is very very very rare and even it happens, Sysadmins
must handle overwhelming in the number of operations in Overseer first.
> Reduce state messages when there are leader only shards
> -------------------------------------------------------
>
> Key: SOLR-10285
> URL: https://issues.apache.org/jira/browse/SOLR-10285
> Project: Solr
> Issue Type: Improvement
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Varun Thacker
> Assignee: Cao Manh Dat
> Attachments: SOLR-10285.patch
>
>
> For shards which have 1 replica ( leader ) we know it doesn't need to recover
> from anyone. We should short-circuit the recovery process in this case.
> The motivation for this being that we will generate less state events and be
> able to mark these replicas as active again without it needing to go into
> 'recovering' state.
> We already short circuit when you set {{-Dsolrcloud.skip.autorecovery=true}}
> but that sys prop was meant for tests only. Extending this to make sure the
> code short-circuits when the core knows its the only replica in the shard is
> the motivation of the Jira.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]