[
https://issues.apache.org/jira/browse/SOLR-17652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17923822#comment-17923822
]
Chris M. Hostetter commented on SOLR-17652:
-------------------------------------------
FYI, here's a way to demonstrate the bug using solr's example mode (commands
below is from 9x, commands need modified slightly to work on main)...
{noformat}
./solr/packaging/build/dev/bin/solr start -e cloud -noprompt
# Setup our collection and both types of replicas
curl -sS
'http://localhost:8983/solr/admin/collections?action=CREATE&name=techproducts&numShards=1&tlogReplicas=1&createNodeSet=localhost:7574_solr'
curl -sS
'http://localhost:8983/solr/admin/collections?action=ADDREPLICA&collection=techproducts&shard=shard1&node=localhost:8983_solr&type=PULL'
./solr/packaging/build/dev/bin/post -c techproducts
./solr/packaging/build/dev/example/exampledocs/*.xml
# shut down pod hosting TLOG leader
./solr/packaging/build/dev/bin/solr stop -p 7574
# stop & re-start pod hosting PULL replica (and embedded zk)
./solr/packaging/build/dev/bin/solr stop -p 8983
./solr/packaging/build/dev/bin/solr start --cloud -p 8983 --solr-home
"/home/hossman/lucene/solr/solr/packaging/build/dev/example/cloud/node1/solr"
--server-dir "/home/hossman/lucene/solr/solr/packaging/build/dev/server"
# Wait ~13 min (until you see an exception like the one above in the logs)
# Bring back the TLOG leader pod...
./solr/packaging/build/dev/bin/solr start --cloud -p 7574 --solr-home
"/home/hossman/lucene/solr/solr/packaging/build/dev/example/cloud/node2/solr"
--server-dir "/home/hossman/lucene/solr/solr/packaging/build/dev/server" -z
127.0.0.1:9983
# PULL replica will still stay DOWN forever
{noformat}
> PULL replicas can be stuck permemantly in DOWN state if leader election takes
> too long
> --------------------------------------------------------------------------------------
>
> Key: SOLR-17652
> URL: https://issues.apache.org/jira/browse/SOLR-17652
> Project: Solr
> Issue Type: Bug
> Reporter: Chris M. Hostetter
> Assignee: Chris M. Hostetter
> Priority: Major
> Attachments: SOLR-17652.patch
>
>
> A bug exists in {{ZkController}} that can cause PULL replicas to be
> permanently stuck in a DOWN state (such that even a core RELOAD can not fix
> it) if that PULL replica was initially loaded during a leader election that
> takes a significant amount of time.
>
> Details to follow in comments
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]