[ https://issues.apache.org/jira/browse/SOLR-9504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15532552#comment-15532552 ]
ASF subversion and git services commented on SOLR-9504: ------------------------------------------------------- Commit ce24de5cd65726dd9593512ec4082ba81b9d7801 in lucene-solr's branch refs/heads/master from [~shalinmangar] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ce24de5 ] SOLR-9504: A replica with an empty index becomes the leader even when other more qualified replicas are in line > A replica with an empty index becomes the leader even when other more > qualified replicas are in line > ---------------------------------------------------------------------------------------------------- > > Key: SOLR-9504 > URL: https://issues.apache.org/jira/browse/SOLR-9504 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud > Affects Versions: master (7.0) > Reporter: Shalin Shekhar Mangar > Priority: Critical > Labels: impact-high > Fix For: 6.3, master (7.0) > > Attachments: SOLR-9504.patch > > > I haven't tried branch_6x or any release yet. But this is trivially > reproducible on master with the following steps: > # Start two solr nodes > # Create a collection with 1 shard, 1 replica so that one node is empty. > # Index some documents > # Shutdown the leader node > # Use addreplica API to create a replica of the collection on the > still-running node. For some reason this API hangs until you restart the > other node (possibly a bug itself) but do not wait for the API to complete. > # Restart the former leader node > You'll find that the replica with 0 docs has become the leader. The former > leader recovers from the leader without replicating any index files. It still > has the old index which has some docs. > This is from the logs of the 0 doc replica: > {code} > 713102 INFO (zkCallback-4-thread-5-processing-n:127.0.1.1:7574_solr) [ ] > o.a.s.c.c.ZkStateReader Updating data for [gettingstarted] from [9] to [10] > 714377 INFO (qtp110456297-15) [c:gettingstarted s:shard1 r:core_node2 > x:gettingstarted_shard1_replica2] o.a.s.c.ShardLeaderElectionContext Enough > replicas found to continue. > 714377 INFO (qtp110456297-15) [c:gettingstarted s:shard1 r:core_node2 > x:gettingstarted_shard1_replica2] o.a.s.c.ShardLeaderElectionContext I may be > the new leader - try and sync > 714377 INFO (qtp110456297-15) [c:gettingstarted s:shard1 r:core_node2 > x:gettingstarted_shard1_replica2] o.a.s.c.SyncStrategy Sync replicas to > http://127.0.1.1:7574/solr/gettingstarted_shard1_replica2/ > 714380 INFO (qtp110456297-15) [c:gettingstarted s:shard1 r:core_node2 > x:gettingstarted_shard1_replica2] o.a.s.u.PeerSync PeerSync: > core=gettingstarted_shard1_replica2 url=http://127.0.1.1:7574/solr START > replicas=[http://127.0.1.1:8983/solr/gettingstarted_shard1_replica1/] > nUpdates=100 > 714381 INFO (qtp110456297-15) [c:gettingstarted s:shard1 r:core_node2 > x:gettingstarted_shard1_replica2] o.a.s.u.PeerSync PeerSync: > core=gettingstarted_shard1_replica2 url=http://127.0.1.1:7574/solr DONE. We > have no versions. sync failed. > 714382 INFO (qtp110456297-15) [c:gettingstarted s:shard1 r:core_node2 > x:gettingstarted_shard1_replica2] o.a.s.c.SyncStrategy Leader's attempt to > sync with shard failed, moving to the next candidate > 714382 INFO (qtp110456297-15) [c:gettingstarted s:shard1 r:core_node2 > x:gettingstarted_shard1_replica2] o.a.s.c.ShardLeaderElectionContext We > failed sync, but we have no versions - we can't sync in that case - we were > active before, so become leader anyway > 714387 INFO (qtp110456297-15) [c:gettingstarted s:shard1 r:core_node2 > x:gettingstarted_shard1_replica2] o.a.s.c.ShardLeaderElectionContextBase > Creating leader registration node > /collections/gettingstarted/leaders/shard1/leader after winning as > /collections/gettingstarted/leader_elect/shard1/election/96579592334475268-core_node2-n_0000000001 > 714398 INFO (qtp110456297-15) [c:gettingstarted s:shard1 r:core_node2 > x:gettingstarted_shard1_replica2] o.a.s.c.ShardLeaderElectionContext I am the > new leader: http://127.0.1.1:7574/solr/gettingstarted_shard1_replica2/ shard1 > {code} > It basically tries to sync but has no versions and because it was active > before (it is a new core starting up for the first time), it becomes the > leader and publishes itself as active. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org