Hoss Man created SOLR-9361:
------------------------------

             Summary: Concept of replica state being "down" is confusing and 
missleading (especially w/DELETEREPLICA)
                 Key: SOLR-9361
                 URL: https://issues.apache.org/jira/browse/SOLR-9361
             Project: Solr
          Issue Type: Bug
      Security Level: Public (Default Security Level. Issues are Public)
            Reporter: Hoss Man



In this thread on solr-user, Jerome Yang pointed out some really confusing 
behavior regarding a "down" node and DELETEREPLICA's behavior when a node is 
not shutdown cleanly...

http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201607.mbox/%3CCA+8Dz=26QuB5qNogG_GNXUU7Ru2JQQ94oH5qJvfztPvn+h=2...@mail.gmail.com%3E

I'll post a comment in a momment with a detailed walk through of how confusing 
the "state" of a node/replica can be when a machine crashes, but the SUmmary 
highlights are...

* Admin UI & CLUSTERSTATUS API use diff terminology to describe replicas hoted 
on machines that can't be reached
** CLUSTERSTATUS API lists the status as "down"
** the Admin UI displays them as "Gone" (even though it also has an option for 
"Down" which never seems to be used)
* Neither Admin UI & CLUSTERSTATUS API distinguish replicas that on nodes that 
were shutdown cleanly vs replicas on nodes that just vanished from the cluster 
(ie: catastrophic failure / network partitioning)
* DELETEREPLICA w/ {{onlyIfDown=true}} only works if a replica was shutdown 
cleanly
** For a replica that was on a node that had catastrophic failure, Using 
{{onlyIfDown=true}} causes an error that the replica {{state is 'active'}}
*** This in spite of the fact that CLUSTERSTATUS API explicitly says 
{{"state":"down"}} for that replica
* DELETEREPLICA on any replica that was hosted on a node that is no longer up 
(either because it was cleanly shutdown using & using {{onlyIfDown=true}} or 
down for any reason and using {{onlyIfDown=false}} generates a failure that 
"{{Server refused connection}}"
** This in spite of the fact that the DELETEREPLICA otherwise appears to have 
succeded


...there are probably multiple underlying bugs here that are exponentially 
worse in the context of eachother.  We should spin off new issues as needed to 
track them once they are concretely identified, but i wanted to open this 
"ubser issue" to capture the overall experience.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to