I had a fairly simple plan for migrating my single solr instance with multiple cores, to a solrcloud implementation where core => collection. My testing locally (windows) worked fine, but the first linux (development) environment I tried to migrate had some failures. This is v5.2.1.
The setup: Single linux box for two solr nodes: ports 8983 & 8984. Both are children of the only SOLRHOME folder. Just using the embedded ZK. Single shard collections with replication factor of 2. SOLRHOME/server/solr (port 8983 with embedded ZK on 9983) - solr start -c SOLRHOME/server/solr2 (port 8984) - solr start -c -p 8984 -s solr2 -z localhost:9983 The plan: Start solr in cloud mode; upload config to ZK; create collections via the collections API; stop solr; copy the "data" folders from the old cores into the new collections on 8983 (/solr); start solr again The first symptom of the problem was trying to stop all nodes with "solr stop -all". It only shut down node 8983. When I then tried "solr stop -p 8984" it had to kill it. Then I noticed the errors in the log: "Error while trying to recover. Server refused connection at: http://10.0.5.213:8984/solr" & "Error while trying to recover. No registered leader was found after waiting for 4000ms". All the indexes I moved were very small - less than 1MB. Only 2 out of 5 collections replicated when solr restarted. The only "clue" (unless its just coincidence) is the two that worked had 8983 as their leader node. The other 3 collections had 8984 - which doesn't have a ZK. Its confusing because this same plan worked on my local machine - even when a collection had 8984 as the leader. Is there a flaw in my plan? Maybe I have to force the leaders to be the same node with the ZK? Why didn't "solr stop -all" work? -- View this message in context: http://lucene.472066.n3.nabble.com/Some-errors-migrating-to-solr-cloud-tp4243594.html Sent from the Solr - User mailing list archive at Nabble.com.