[jira] [Commented] (SOLR-9361) Concept of replica state being "down" is confusing and missleading (especially w/DELETEREPLICA)

Mark Miller (JIRA) Fri, 29 Jul 2016 22:02:52 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-9361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400449#comment-15400449
 ]


Mark Miller commented on SOLR-9361:
-----------------------------------

Historical info. Gone means no live node in zookeeper. Down should mean, either 
stale state and no zk connection or connected to zk and working to move from 
down to recovering. 

> Concept of replica state being "down" is confusing and missleading 
> (especially w/DELETEREPLICA)
> -----------------------------------------------------------------------------------------------
>
>                 Key: SOLR-9361
>                 URL: https://issues.apache.org/jira/browse/SOLR-9361
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Hoss Man
>
> In this thread on solr-user, Jerome Yang pointed out some really confusing 
> behavior regarding a "down" node and DELETEREPLICA's behavior when a node is 
> not shutdown cleanly...
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201607.mbox/%3CCA+8Dz=26QuB5qNogG_GNXUU7Ru2JQQ94oH5qJvfztPvn+h=2...@mail.gmail.com%3E
> I'll post a comment in a momment with a detailed walk through of how 
> confusing the "state" of a node/replica can be when a machine crashes, but 
> the SUmmary highlights are...
> * Admin UI & CLUSTERSTATUS API use diff terminology to describe replicas 
> hoted on machines that can't be reached
> ** CLUSTERSTATUS API lists the status as "down"
> ** the Admin UI displays them as "Gone" (even though it also has an option 
> for "Down" which never seems to be used)
> * Neither Admin UI & CLUSTERSTATUS API distinguish replicas that on nodes 
> that were shutdown cleanly vs replicas on nodes that just vanished from the 
> cluster (ie: catastrophic failure / network partitioning)
> * DELETEREPLICA w/ {{onlyIfDown=true}} only works if a replica was shutdown 
> cleanly
> ** For a replica that was on a node that had catastrophic failure, Using 
> {{onlyIfDown=true}} causes an error that the replica {{state is 'active'}}
> *** This in spite of the fact that CLUSTERSTATUS API explicitly says 
> {{"state":"down"}} for that replica
> * DELETEREPLICA on any replica that was hosted on a node that is no longer up 
> (either because it was cleanly shutdown using & using {{onlyIfDown=true}} or 
> down for any reason and using {{onlyIfDown=false}} generates a failure that 
> "{{Server refused connection}}"
> ** This in spite of the fact that the DELETEREPLICA otherwise appears to have 
> succeded
> ...there are probably multiple underlying bugs here that are exponentially 
> worse in the context of eachother.  We should spin off new issues as needed 
> to track them once they are concretely identified, but i wanted to open this 
> "ubser issue" to capture the overall experience.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-9361) Concept of replica state being "down" is confusing and missleading (especially w/DELETEREPLICA)

Reply via email to