[
https://issues.apache.org/jira/browse/SOLR-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338469#comment-14338469
]
Shalin Shekhar Mangar commented on SOLR-6923:
---------------------------------------------
I was going to backport is to 4.10.4 but then I realized that this code has:
{code}
if (lastClusterStateVersion == clusterState.getZkClusterStateVersion() &&
baseUrlForBadNodes.size() == 0 &&
liveNodes.equals(clusterState.getLiveNodes())) {
...
}
{code}
Two Number objects are compared using == instead of .equals which is only
guaranteed to work if the values are between -128 to 127. This is buggy!
> AutoAddReplicas should consult live nodes also to see if a state has changed
> ----------------------------------------------------------------------------
>
> Key: SOLR-6923
> URL: https://issues.apache.org/jira/browse/SOLR-6923
> Project: Solr
> Issue Type: Bug
> Reporter: Varun Thacker
> Assignee: Mark Miller
> Fix For: 5.0, Trunk
>
> Attachments: SOLR-6923.patch
>
>
> - I did the following
> {code}
> ./solr start -e cloud -noprompt
> kill -9 <pid-of-node2> //Not the node which is running ZK
> {code}
> - /live_nodes reflects that the node is gone.
> - This is the only message which gets logged on the node1 server after
> killing node2
> {code}
> 45812 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:9983] WARN
> org.apache.zookeeper.server.NIOServerCnxn – caught end of stream exception
> EndOfStreamException: Unable to read additional data from client sessionid
> 0x14ac40f26660001, likely client has closed socket
> at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
> at
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> - The graph shows the node2 as 'Gone' state
> - clusterstate.json keeps showing the replica as 'active'
> {code}
> {"collection1":{
> "shards":{"shard1":{
> "range":"80000000-7fffffff",
> "state":"active",
> "replicas":{
> "core_node1":{
> "state":"active",
> "core":"collection1",
> "node_name":"169.254.113.194:8983_solr",
> "base_url":"http://169.254.113.194:8983/solr",
> "leader":"true"},
> "core_node2":{
> "state":"active",
> "core":"collection1",
> "node_name":"169.254.113.194:8984_solr",
> "base_url":"http://169.254.113.194:8984/solr"}}}},
> "maxShardsPerNode":"1",
> "router":{"name":"compositeId"},
> "replicationFactor":"1",
> "autoAddReplicas":"false",
> "autoCreated":"true"}}
> {code}
> One immediate problem I can see is that AutoAddReplicas doesn't work since
> the clusterstate.json never changes. There might be more features which are
> affected by this.
> On first thought I think we can handle this - The shard leader could listen
> to changes on /live_nodes and if it has replicas that were on that node, mark
> it as 'down' in the clusterstate.json?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]