[
https://issues.apache.org/jira/browse/SOLR-11702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16325907#comment-16325907
]
Cao Manh Dat commented on SOLR-11702:
-------------------------------------
Thanks [~shalinmangar]
{quote}
1. DUP.setupRequest skips replicas having terms. If I understand correctly,
this will mean that updates are no longer forwarded to replicas until they
publish themselves in recovery? Is that right?
{quote}
Right, if term of a replica is less than leader term, leader will stop sending
updates to that replica.
{quote}
2. CreateCollectionCmd – throw InterruptedException directly from the method
instead of trying to handle it here
{quote}
The code of deleting old term nodes in CreateCollectionCmd is handled exactly
same as the code below it, I do not understand the problem here.
{quote}
3. Mark LIR related classes/methods as deprecated – those are more likely to
get attention right before 8.0 I think.
{quote}
Sure, this is a good idea
{quote}
5. RecoveringCoreTermWatcher – Shouldn't lastTermDoRecovery be set after
recovery completes? If not, how do we ensure that recoveries are stacked up?
{quote}
I do not see any problem in the current implementation, after we call
{{doRecovery}}, the recovery process will start shortly
{quote}
6. RecoveringCoreTermWatcher catches NullPointerException. Do a null check
instead.
{quote}
Sure!
{quote}
7. RecoveryStrategy – why pingLeader? isn't it sufficient to use
ZkStateReader.getLeaderRetry as we used to do earlier?
{quote}
Imagine this case, when there are network partition between leader and replica
* Leader increase term of replica
* RecoveringCoreTermWatcher trigger recovery process of replica, replica goes
into recovery ( hence increase its term )
* Leader increase term of replica ( because it failed to send update to replica
and now term of replica is equals to leader's term)
* RecoveringCoreTermWatcher trigger recovery process of replica, replica goes
into recovery ( hence increase its term )
* ... this process will be repeated forever until the network is healed
{quote}
8. ZkCollectionTerms – if getShard and remove methods need to be synchronized
then seems like close can interfere. Perhaps better to synchronize on the terms
map itself.
{quote}
This is a good idea
{quote}
9. Can you explain the purpose of "new".equals(cd.getCoreProperty("lirVersion",
"new"))) used in various places?
{quote}
That flag mostly used for testing rolling updates and can be removed in
SOLR-11812
> Redesign current LIR implementation
> -----------------------------------
>
> Key: SOLR-11702
> URL: https://issues.apache.org/jira/browse/SOLR-11702
> Project: Solr
> Issue Type: Improvement
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Cao Manh Dat
> Assignee: Cao Manh Dat
> Priority: Major
> Attachments: SOLR-11702.patch, SOLR-11702.patch
>
>
> I recently looked into some problem related to racing between LIR and
> Recovering. I would like to propose a totally new approach to solve SOLR-5495
> problem because fixing current implementation by a bandage will lead us to
> other problems (we can not prove the correctness of the implementation).
> Feel free to give comments/thoughts about this new scheme.
> https://docs.google.com/document/d/1dM2GKMULsS45ZMuvtztVnM2m3fdUeRYNCyJorIIisEo/edit?usp=sharing
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]