[ 
https://issues.apache.org/jira/browse/SOLR-15288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17312808#comment-17312808
 ] 

Ishan Chattopadhyaya commented on SOLR-15288:
---------------------------------------------

Testing further with two different directories instead of single directory, 
here's a bug we encountered.

Build Solr, create two directories and start ZK:
# pkill -9 java; rm -rf firstsolr secondsolr
# cd solr; ant package
# tar -xf package/solr-8.8.1-SNAPSHOT.tgz; mv solr-8.8.1-SNAPSHOT firstsolr; cp 
-r firstsolr secondsolr
# docker container prune -f && docker run -it -p 2181:2181 --name=zk1 -h zk1 
zookeeper:3.5.6

Start first Solr node, create PRS collection, then start second node and add a 
replica there:
# cd firstsolr
# bin/solr -c -p 9000 -z localhost:2181
# curl 
"http://localhost:9000/solr/admin/collections?action=CREATE&name=mycoll&numShards=1&perReplicaState=true";

# cd ../secondsolr
# bin/solr -c -p 9001 -z localhost:2181
# curl 
"http://localhost:9000/solr/admin/collections?action=ADDREPLICA&collection=mycoll&shard=shard1";

Check the PRS znodes:
# docker exec -it zk1  /apache-zookeeper-3.5.6-bin/bin/zkCli.sh stat 
/collections/mycoll/state.json
# bin/solr zk ls -r /collections/mycoll/state.json -z localhost:2181

Stop and check PRS znodes:
# bin/solr stop -p 9001
# docker exec -it zk1  /apache-zookeeper-3.5.6-bin/bin/zkCli.sh stat 
/collections/mycoll/state.json
# bin/solr zk ls -r /collections/mycoll/state.json -z localhost:2181

*Bug:* The second replica (core_node4) shows up as "A" (active).

Start and check PRS znodes:
# bin/solr -c -p 9001 -z localhost:2181
# docker exec -it zk1  /apache-zookeeper-3.5.6-bin/bin/zkCli.sh stat 
/collections/mycoll/state.json
# bin/solr zk ls -r /collections/mycoll/state.json -z localhost:2181


This bug is fixed with the attached PR. Since this is a bug that affects 
regular Solr users (prod), and not just dev testing, I'm upgrading this issue 
back to Critical, and let us release the fix in a bugfix release.

>  PRS replicas stay DOWN after a new node is restarted
> -----------------------------------------------------
>
>                 Key: SOLR-15288
>                 URL: https://issues.apache.org/jira/browse/SOLR-15288
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>    Affects Versions: 8.8.1
>            Reporter: Ishan Chattopadhyaya
>            Priority: Minor
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> After a PRS collection is created using a single node cluster, and a new node 
> is added and a replica for that collection is placed on the new node, 
> restarting that new node causes problems with replica states.
> Reproduce script:
> {code}
> # Start a fresh ZK on 2181
> # docker container prune -f && docker run -it -p 2181:2181 --name=zk1 -h zk1 
> zookeeper:3.5.6
> rm -rf server/logs/*
> bin/solr stop -all
> rm -rf server/solr/mycoll_shard1_replica_n1/ 
> server/solr/mycoll_shard1_replica_n3/
> bin/solr -c -p 9000 -z localhost:2181
> curl 
> "http://localhost:9000/solr/admin/collections?action=CREATE&name=mycoll&numShards=1&perReplicaState=true";
> bin/solr -c -p 9001 -z localhost:2181
> curl 
> "http://localhost:9000/solr/admin/collections?action=ADDREPLICA&collection=mycoll&shard=shard1";
> bin/solr stop -p 9001
> bin/solr -c -p 9001 -z localhost:2181
> {code}
> Two problems:
> 1. Now look at the two replicas, both are down. 
> 2. Also, as [~hitesh.khamesra] found out, the second replica stays ACTIVE 
> (not DOWN) after the second node (9001) is stopped.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to