Hoss Man created SOLR-9361:
------------------------------
Summary: Concept of replica state being "down" is confusing and
missleading (especially w/DELETEREPLICA)
Key: SOLR-9361
URL: https://issues.apache.org/jira/browse/SOLR-9361
Project: Solr
Issue Type: Bug
Security Level: Public (Default Security Level. Issues are Public)
Reporter: Hoss Man
In this thread on solr-user, Jerome Yang pointed out some really confusing
behavior regarding a "down" node and DELETEREPLICA's behavior when a node is
not shutdown cleanly...
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201607.mbox/%3CCA+8Dz=26QuB5qNogG_GNXUU7Ru2JQQ94oH5qJvfztPvn+h=2...@mail.gmail.com%3E
I'll post a comment in a momment with a detailed walk through of how confusing
the "state" of a node/replica can be when a machine crashes, but the SUmmary
highlights are...
* Admin UI & CLUSTERSTATUS API use diff terminology to describe replicas hoted
on machines that can't be reached
** CLUSTERSTATUS API lists the status as "down"
** the Admin UI displays them as "Gone" (even though it also has an option for
"Down" which never seems to be used)
* Neither Admin UI & CLUSTERSTATUS API distinguish replicas that on nodes that
were shutdown cleanly vs replicas on nodes that just vanished from the cluster
(ie: catastrophic failure / network partitioning)
* DELETEREPLICA w/ {{onlyIfDown=true}} only works if a replica was shutdown
cleanly
** For a replica that was on a node that had catastrophic failure, Using
{{onlyIfDown=true}} causes an error that the replica {{state is 'active'}}
*** This in spite of the fact that CLUSTERSTATUS API explicitly says
{{"state":"down"}} for that replica
* DELETEREPLICA on any replica that was hosted on a node that is no longer up
(either because it was cleanly shutdown using & using {{onlyIfDown=true}} or
down for any reason and using {{onlyIfDown=false}} generates a failure that
"{{Server refused connection}}"
** This in spite of the fact that the DELETEREPLICA otherwise appears to have
succeded
...there are probably multiple underlying bugs here that are exponentially
worse in the context of eachother. We should spin off new issues as needed to
track them once they are concretely identified, but i wanted to open this
"ubser issue" to capture the overall experience.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]