[
https://issues.apache.org/jira/browse/SOLR-8249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14995755#comment-14995755
]
Ishan Chattopadhyaya edited comment on SOLR-8249 at 11/8/15 7:23 PM:
---------------------------------------------------------------------
I thought the last patch (which tested for stats on LEADER operation instead of
STATE) would fix the failures. However, this is still failing at Steve's
jenkins: http://jenkins.sarowe.net/job/Lucene-Solr-tests-trunk/3627/
Here's the relevant log from that test:
{noformat}
[junit4] 2> 200957 INFO
(TEST-OverseerTest.testOverseerStatsReset-seed#[6F0C5F964D692279]) [ ]
o.a.s.c.OverseerAutoReplicaFailoverThread Starting
OverseerAutoReplicaFailoverThread autoReplicaFailoverWorkLoopDelay=10000
autoReplicaFailoverWaitAfterExpiration=30000
autoReplicaFailoverBadNodeExpiration=60000
[junit4] 2> 200957 INFO (zkCallback-279-thread-2) [ ]
o.a.s.c.c.ZkStateReader A live node change: WatchedEvent state:SyncConnected
type:NodeChildrenChanged path:/live_nodes, has occurred - updating... (live
nodes size: 0)
[junit4] 2> 200971 INFO
(OverseerCollectionConfigSetProcessor-94830728503164930-127.0.0.1:50527_solr-n_0000000000)
[ ] o.a.s.c.OverseerTaskProcessor Process current queue of overseer
operations
[junit4] 2> 201010 INFO
(OverseerStateUpdate-94830728503164930-127.0.0.1:50527_solr-n_0000000000) [
] o.a.s.c.Overseer Starting to work on the main queue
[junit4] 2> 201012 INFO
(OverseerStateUpdate-94830728503164930-127.0.0.1:50527_solr-n_0000000000) [
] o.a.s.c.Overseer processMessage: queueSize: 1, message = {
[junit4] 2> "operation":"state",
[junit4] 2> "state":"recovering",
[junit4] 2> "node_name":"node1",
[junit4] 2> "core":"core1",
[junit4] 2> "core_node_name":"core_node1",
[junit4] 2> "collection":"collection1",
[junit4] 2> "numShards":"1",
[junit4] 2> "base_url":"http://node1/solr/"} current state version: 0
[junit4] 2> 201012 INFO
(OverseerStateUpdate-94830728503164930-127.0.0.1:50527_solr-n_0000000000) [
] o.a.s.c.o.ReplicaMutator Update state numShards=1 message={
[junit4] 2> "operation":"state",
[junit4] 2> "state":"recovering",
[junit4] 2> "node_name":"node1",
[junit4] 2> "core":"core1",
[junit4] 2> "core_node_name":"core_node1",
[junit4] 2> "collection":"collection1",
[junit4] 2> "numShards":"1",
[junit4] 2> "base_url":"http://node1/solr/"}
[junit4] 2> 201012 INFO
(OverseerStateUpdate-94830728503164930-127.0.0.1:50527_solr-n_0000000000) [
] o.a.s.c.o.ClusterStateMutator building a new cName: collection1
[junit4] 2> 201013 INFO
(OverseerStateUpdate-94830728503164930-127.0.0.1:50527_solr-n_0000000000) [
] o.a.s.c.o.ReplicaMutator Assigning new node to shard shard=shard1
[junit4] 2> 201013 INFO (zkCallback-278-thread-1) [ ]
o.a.s.c.c.ZkStateReader A cluster state change: WatchedEvent
state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred
- updating... (live nodes size: 1)
[junit4] 2> 201013 INFO (zkCallback-279-thread-2) [ ]
o.a.s.c.c.ZkStateReader A cluster state change: WatchedEvent
state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred
- updating... (live nodes size: 1)
[junit4] 2> 201481 INFO
(TEST-OverseerTest.testOverseerStatsReset-seed#[6F0C5F964D692279]) [ ]
o.a.s.c.c.SolrZkClient makePath:
/collections/collection1/leader_elect/shard1/election
[junit4] 2> 201481 INFO (zkCallback-278-thread-1) [ ]
o.a.s.c.c.ZkStateReader A collections change: WatchedEvent state:SyncConnected
type:NodeChildrenChanged path:/collections, has occurred - updating...
[junit4] 2> 201481 INFO (zkCallback-279-thread-2) [ ]
o.a.s.c.c.ZkStateReader A collections change: WatchedEvent state:SyncConnected
type:NodeChildrenChanged path:/collections, has occurred - updating...
[junit4] 2> 201483 INFO
(TEST-OverseerTest.testOverseerStatsReset-seed#[6F0C5F964D692279]) [ ]
o.a.s.c.LeaderElector Joined leadership election with path:
/collections/collection1/leader_elect/shard1/election/94830728503164931-node1_core1-n_0000000000
[junit4] 2> 201483 INFO
(TEST-OverseerTest.testOverseerStatsReset-seed#[6F0C5F964D692279]) [ ]
o.a.s.c.c.SolrZkClient makePath: /collections/collection1/leaders/shard1
[junit4] 2> 201484 INFO
(TEST-OverseerTest.testOverseerStatsReset-seed#[6F0C5F964D692279]) [ ]
o.a.s.c.ShardLeaderElectionContextBase Creating leader registration node
[junit4] 2> 201486 INFO
(OverseerStateUpdate-94830728503164930-127.0.0.1:50527_solr-n_0000000000) [
] o.a.s.c.Overseer processMessage: queueSize: 1, message = {
[junit4] 2> "operation":"state",
[junit4] 2> "state":"active",
[junit4] 2> "shard":"shard1",
[junit4] 2> "collection":"collection1",
[junit4] 2> "base_url":"http://node1/solr/",
[junit4] 2> "node_name":"node1",
[junit4] 2> "core_node_name":"core_node1",
[junit4] 2> "core":"core1"} current state version: 1
[junit4] 2> 201487 INFO
(OverseerStateUpdate-94830728503164930-127.0.0.1:50527_solr-n_0000000000) [
] o.a.s.c.o.ReplicaMutator Update state numShards=null message={
[junit4] 2> "operation":"state",
[junit4] 2> "state":"active",
[junit4] 2> "shard":"shard1",
[junit4] 2> "collection":"collection1",
[junit4] 2> "base_url":"http://node1/solr/",
[junit4] 2> "node_name":"node1",
[junit4] 2> "core_node_name":"core_node1",
[junit4] 2> "core":"core1"}
[junit4] 2> 201488 INFO
(OverseerStateUpdate-94830728503164930-127.0.0.1:50527_solr-n_0000000000) [
] o.a.s.c.Overseer processMessage: queueSize: 1, message = {
[junit4] 2> "operation":"leader",
[junit4] 2> "shard":"shard1",
[junit4] 2> "collection":"collection1",
[junit4] 2> "base_url":"http://node1/solr/",
[junit4] 2> "core":"core1"} current state version: 1
[junit4] 2> 201489 INFO
(TEST-OverseerTest.testOverseerStatsReset-seed#[6F0C5F964D692279]) [ ]
o.a.s.c.Overseer Overseer
(id=94830728503164930-127.0.0.1:50527_solr-n_0000000000) closing
[junit4] 2> 201489 INFO
(TEST-OverseerTest.testOverseerStatsReset-seed#[6F0C5F964D692279]) [ ]
o.a.s.c.ElectionContext Canceling election
/overseer_elect/election/94830728503164930-127.0.0.1:50527_solr-n_0000000000
[junit4] 2> 201490 INFO
(OverseerStateUpdate-94830728503164930-127.0.0.1:50527_solr-n_0000000000) [
] o.a.s.c.Overseer Overseer Loop exiting : 127.0.0.1:50527_solr
[junit4] 2> 201491 INFO (OverseerExitThread) [ ] o.a.s.c.Overseer I'm
exiting , but I'm still the leader
[junit4] 2> 201491 INFO
(TEST-OverseerTest.testOverseerStatsReset-seed#[6F0C5F964D692279]) [ ]
o.a.s.c.LeaderElector Joined leadership election with path:
/overseer_elect/election/94830728503164930-127.0.0.1:50527_solr-n_0000000001
[junit4] 2> 201492 INFO
(TEST-OverseerTest.testOverseerStatsReset-seed#[6F0C5F964D692279]) [ ]
o.a.s.c.OverseerElectionContext I am going to be the leader 127.0.0.1:50527_solr
[junit4] 2> 201492 INFO
(TEST-OverseerTest.testOverseerStatsReset-seed#[6F0C5F964D692279]) [ ]
o.a.s.c.c.SolrZkClient makePath: /overseer_elect/leader
[junit4] 2> 201492 INFO
(TEST-OverseerTest.testOverseerStatsReset-seed#[6F0C5F964D692279]) [ ]
o.a.s.c.Overseer Overseer
(id=94830728503164930-127.0.0.1:50527_solr-n_0000000001) starting
[junit4] 2> 201494 INFO
(TEST-OverseerTest.testOverseerStatsReset-seed#[6F0C5F964D692279]) [ ]
o.a.s.c.OverseerAutoReplicaFailoverThread Starting
OverseerAutoReplicaFailoverThread autoReplicaFailoverWorkLoopDelay=10000
autoReplicaFailoverWaitAfterExpiration=30000
autoReplicaFailoverBadNodeExpiration=60000
[junit4] 2> 201538 INFO
(OverseerStateUpdate-94830728503164930-127.0.0.1:50527_solr-n_0000000001) [
] o.a.s.c.Overseer Starting to work on the main queue
[junit4] 2> 201541 INFO
(OverseerStateUpdate-94830728503164930-127.0.0.1:50527_solr-n_0000000001) [
] o.a.s.c.Overseer processMessage: workQueueSize: 2, message = {
[junit4] 2> "operation":"state",
[junit4] 2> "state":"active",
[junit4] 2> "shard":"shard1",
[junit4] 2> "collection":"collection1",
[junit4] 2> "base_url":"http://node1/solr/",
[junit4] 2> "node_name":"node1",
[junit4] 2> "core_node_name":"core_node1",
[junit4] 2> "core":"core1"}
[junit4] 2> 201541 INFO
(OverseerStateUpdate-94830728503164930-127.0.0.1:50527_solr-n_0000000001) [
] o.a.s.c.o.ReplicaMutator Update state numShards=null message={
[junit4] 2> "operation":"state",
[junit4] 2> "state":"active",
[junit4] 2> "shard":"shard1",
[junit4] 2> "collection":"collection1",
[junit4] 2> "base_url":"http://node1/solr/",
[junit4] 2> "node_name":"node1",
[junit4] 2> "core_node_name":"core_node1",
[junit4] 2> "core":"core1"}
[junit4] 2> 201541 INFO (zkCallback-279-thread-2) [ ]
o.a.s.c.c.ZkStateReader A cluster state change: WatchedEvent
state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred
- updating... (live nodes size: 1)
[junit4] 2> 201541 INFO (zkCallback-278-thread-1) [ ]
o.a.s.c.c.ZkStateReader A cluster state change: WatchedEvent
state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred
- updating... (live nodes size: 1)
[junit4] 2> 201542 INFO
(OverseerStateUpdate-94830728503164930-127.0.0.1:50527_solr-n_0000000001) [
] o.a.s.c.Overseer processMessage: workQueueSize: 2, message = {
[junit4] 2> "operation":"leader",
[junit4] 2> "shard":"shard1",
[junit4] 2> "collection":"collection1",
[junit4] 2> "base_url":"http://node1/solr/",
[junit4] 2> "core":"core1"}
[junit4] 2> 201542 INFO (zkCallback-279-thread-2) [ ]
o.a.s.c.c.ZkStateReader A cluster state change: WatchedEvent
state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred
- updating... (live nodes size: 1)
[junit4] 2> 201542 INFO
(OverseerCollectionConfigSetProcessor-94830728503164930-127.0.0.1:50527_solr-n_0000000001)
[ ] o.a.s.c.OverseerTaskProcessor Process current queue of overseer
operations
[junit4] 2> 201543 INFO (zkCallback-278-thread-1) [ ]
o.a.s.c.c.ZkStateReader A cluster state change: WatchedEvent
state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred
- updating... (live nodes size: 1)
[junit4] 2> 201551 INFO (zkCallback-278-thread-1) [ ]
o.a.s.c.c.ZkStateReader A live node change: WatchedEvent state:SyncConnected
type:NodeChildrenChanged path:/live_nodes, has occurred - updating... (live
nodes size: 1)
[junit4] 2> 201551 INFO (zkCallback-279-thread-2) [ ]
o.a.s.c.c.ZkStateReader A live node change: WatchedEvent state:SyncConnected
type:NodeChildrenChanged path:/live_nodes, has occurred - updating... (live
nodes size: 1)
{noformat}
This failed because there was a second leader election (or maybe the overseer
queue messages from the first leader election was reprocessed after the
overseer was shutdown) between the two asserts. I have not been able to
reproduce it myself, but Steve's jenkins is a very fast machine.
Now, I am thinking of changing tack, and switching back to testing for STATE
message stats. The problem was initially that because there was a RECOVERING
STATE message passed in, the leader election sent in an extra ACTIVE message
(owing to SOLR-7989). However, if we send in ACTIVE instead of RECOVERING,
there won't be another STATE message sent in during leader election (even if
there is a second leader election). This, I think will fix the test. Attaching
the patch for this.
I'll seek Steve's help and beast it on his machine to be sure.
was (Author: ichattopadhyaya):
I thought the last patch (which tested for stats on LEADER operation instead of
STATE) would fix the failures. However, this is still failing at Steve's
jenkins: http://jenkins.sarowe.net/job/Lucene-Solr-tests-trunk/3627/
Here's the relevant log from that test:
{noformat}
[junit4] 2> 200957 INFO
(TEST-OverseerTest.testOverseerStatsReset-seed#[6F0C5F964D692279]) [ ]
o.a.s.c.OverseerAutoReplicaFailoverThread Starting
OverseerAutoReplicaFailoverThread autoReplicaFailoverWorkLoopDelay=10000
autoReplicaFailoverWaitAfterExpiration=30000
autoReplicaFailoverBadNodeExpiration=60000
[junit4] 2> 200957 INFO (zkCallback-279-thread-2) [ ]
o.a.s.c.c.ZkStateReader A live node change: WatchedEvent state:SyncConnected
type:NodeChildrenChanged path:/live_nodes, has occurred - updating... (live
nodes size: 0)
[junit4] 2> 200971 INFO
(OverseerCollectionConfigSetProcessor-94830728503164930-127.0.0.1:50527_solr-n_0000000000)
[ ] o.a.s.c.OverseerTaskProcessor Process current queue of overseer
operations
[junit4] 2> 201010 INFO
(OverseerStateUpdate-94830728503164930-127.0.0.1:50527_solr-n_0000000000) [
] o.a.s.c.Overseer Starting to work on the main queue
[junit4] 2> 201012 INFO
(OverseerStateUpdate-94830728503164930-127.0.0.1:50527_solr-n_0000000000) [
] o.a.s.c.Overseer processMessage: queueSize: 1, message = {
[junit4] 2> "operation":"state",
[junit4] 2> "state":"recovering",
[junit4] 2> "node_name":"node1",
[junit4] 2> "core":"core1",
[junit4] 2> "core_node_name":"core_node1",
[junit4] 2> "collection":"collection1",
[junit4] 2> "numShards":"1",
[junit4] 2> "base_url":"http://node1/solr/"} current state version: 0
[junit4] 2> 201012 INFO
(OverseerStateUpdate-94830728503164930-127.0.0.1:50527_solr-n_0000000000) [
] o.a.s.c.o.ReplicaMutator Update state numShards=1 message={
[junit4] 2> "operation":"state",
[junit4] 2> "state":"recovering",
[junit4] 2> "node_name":"node1",
[junit4] 2> "core":"core1",
[junit4] 2> "core_node_name":"core_node1",
[junit4] 2> "collection":"collection1",
[junit4] 2> "numShards":"1",
[junit4] 2> "base_url":"http://node1/solr/"}
[junit4] 2> 201012 INFO
(OverseerStateUpdate-94830728503164930-127.0.0.1:50527_solr-n_0000000000) [
] o.a.s.c.o.ClusterStateMutator building a new cName: collection1
[junit4] 2> 201013 INFO
(OverseerStateUpdate-94830728503164930-127.0.0.1:50527_solr-n_0000000000) [
] o.a.s.c.o.ReplicaMutator Assigning new node to shard shard=shard1
[junit4] 2> 201013 INFO (zkCallback-278-thread-1) [ ]
o.a.s.c.c.ZkStateReader A cluster state change: WatchedEvent
state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred
- updating... (live nodes size: 1)
[junit4] 2> 201013 INFO (zkCallback-279-thread-2) [ ]
o.a.s.c.c.ZkStateReader A cluster state change: WatchedEvent
state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred
- updating... (live nodes size: 1)
[junit4] 2> 201481 INFO
(TEST-OverseerTest.testOverseerStatsReset-seed#[6F0C5F964D692279]) [ ]
o.a.s.c.c.SolrZkClient makePath:
/collections/collection1/leader_elect/shard1/election
[junit4] 2> 201481 INFO (zkCallback-278-thread-1) [ ]
o.a.s.c.c.ZkStateReader A collections change: WatchedEvent state:SyncConnected
type:NodeChildrenChanged path:/collections, has occurred - updating...
[junit4] 2> 201481 INFO (zkCallback-279-thread-2) [ ]
o.a.s.c.c.ZkStateReader A collections change: WatchedEvent state:SyncConnected
type:NodeChildrenChanged path:/collections, has occurred - updating...
[junit4] 2> 201483 INFO
(TEST-OverseerTest.testOverseerStatsReset-seed#[6F0C5F964D692279]) [ ]
o.a.s.c.LeaderElector Joined leadership election with path:
/collections/collection1/leader_elect/shard1/election/94830728503164931-node1_core1-n_0000000000
[junit4] 2> 201483 INFO
(TEST-OverseerTest.testOverseerStatsReset-seed#[6F0C5F964D692279]) [ ]
o.a.s.c.c.SolrZkClient makePath: /collections/collection1/leaders/shard1
[junit4] 2> 201484 INFO
(TEST-OverseerTest.testOverseerStatsReset-seed#[6F0C5F964D692279]) [ ]
o.a.s.c.ShardLeaderElectionContextBase Creating leader registration node
[junit4] 2> 201486 INFO
(OverseerStateUpdate-94830728503164930-127.0.0.1:50527_solr-n_0000000000) [
] o.a.s.c.Overseer processMessage: queueSize: 1, message = {
[junit4] 2> "operation":"state",
[junit4] 2> "state":"active",
[junit4] 2> "shard":"shard1",
[junit4] 2> "collection":"collection1",
[junit4] 2> "base_url":"http://node1/solr/",
[junit4] 2> "node_name":"node1",
[junit4] 2> "core_node_name":"core_node1",
[junit4] 2> "core":"core1"} current state version: 1
[junit4] 2> 201487 INFO
(OverseerStateUpdate-94830728503164930-127.0.0.1:50527_solr-n_0000000000) [
] o.a.s.c.o.ReplicaMutator Update state numShards=null message={
[junit4] 2> "operation":"state",
[junit4] 2> "state":"active",
[junit4] 2> "shard":"shard1",
[junit4] 2> "collection":"collection1",
[junit4] 2> "base_url":"http://node1/solr/",
[junit4] 2> "node_name":"node1",
[junit4] 2> "core_node_name":"core_node1",
[junit4] 2> "core":"core1"}
[junit4] 2> 201488 INFO
(OverseerStateUpdate-94830728503164930-127.0.0.1:50527_solr-n_0000000000) [
] o.a.s.c.Overseer processMessage: queueSize: 1, message = {
[junit4] 2> "operation":"leader",
[junit4] 2> "shard":"shard1",
[junit4] 2> "collection":"collection1",
[junit4] 2> "base_url":"http://node1/solr/",
[junit4] 2> "core":"core1"} current state version: 1
[junit4] 2> 201489 INFO
(TEST-OverseerTest.testOverseerStatsReset-seed#[6F0C5F964D692279]) [ ]
o.a.s.c.Overseer Overseer
(id=94830728503164930-127.0.0.1:50527_solr-n_0000000000) closing
[junit4] 2> 201489 INFO
(TEST-OverseerTest.testOverseerStatsReset-seed#[6F0C5F964D692279]) [ ]
o.a.s.c.ElectionContext Canceling election
/overseer_elect/election/94830728503164930-127.0.0.1:50527_solr-n_0000000000
[junit4] 2> 201490 INFO
(OverseerStateUpdate-94830728503164930-127.0.0.1:50527_solr-n_0000000000) [
] o.a.s.c.Overseer Overseer Loop exiting : 127.0.0.1:50527_solr
[junit4] 2> 201491 INFO (OverseerExitThread) [ ] o.a.s.c.Overseer I'm
exiting , but I'm still the leader
[junit4] 2> 201491 INFO
(TEST-OverseerTest.testOverseerStatsReset-seed#[6F0C5F964D692279]) [ ]
o.a.s.c.LeaderElector Joined leadership election with path:
/overseer_elect/election/94830728503164930-127.0.0.1:50527_solr-n_0000000001
[junit4] 2> 201492 INFO
(TEST-OverseerTest.testOverseerStatsReset-seed#[6F0C5F964D692279]) [ ]
o.a.s.c.OverseerElectionContext I am going to be the leader 127.0.0.1:50527_solr
[junit4] 2> 201492 INFO
(TEST-OverseerTest.testOverseerStatsReset-seed#[6F0C5F964D692279]) [ ]
o.a.s.c.c.SolrZkClient makePath: /overseer_elect/leader
[junit4] 2> 201492 INFO
(TEST-OverseerTest.testOverseerStatsReset-seed#[6F0C5F964D692279]) [ ]
o.a.s.c.Overseer Overseer
(id=94830728503164930-127.0.0.1:50527_solr-n_0000000001) starting
[junit4] 2> 201494 INFO
(TEST-OverseerTest.testOverseerStatsReset-seed#[6F0C5F964D692279]) [ ]
o.a.s.c.OverseerAutoReplicaFailoverThread Starting
OverseerAutoReplicaFailoverThread autoReplicaFailoverWorkLoopDelay=10000
autoReplicaFailoverWaitAfterExpiration=30000
autoReplicaFailoverBadNodeExpiration=60000
[junit4] 2> 201538 INFO
(OverseerStateUpdate-94830728503164930-127.0.0.1:50527_solr-n_0000000001) [
] o.a.s.c.Overseer Starting to work on the main queue
[junit4] 2> 201541 INFO
(OverseerStateUpdate-94830728503164930-127.0.0.1:50527_solr-n_0000000001) [
] o.a.s.c.Overseer processMessage: workQueueSize: 2, message = {
[junit4] 2> "operation":"state",
[junit4] 2> "state":"active",
[junit4] 2> "shard":"shard1",
[junit4] 2> "collection":"collection1",
[junit4] 2> "base_url":"http://node1/solr/",
[junit4] 2> "node_name":"node1",
[junit4] 2> "core_node_name":"core_node1",
[junit4] 2> "core":"core1"}
[junit4] 2> 201541 INFO
(OverseerStateUpdate-94830728503164930-127.0.0.1:50527_solr-n_0000000001) [
] o.a.s.c.o.ReplicaMutator Update state numShards=null message={
[junit4] 2> "operation":"state",
[junit4] 2> "state":"active",
[junit4] 2> "shard":"shard1",
[junit4] 2> "collection":"collection1",
[junit4] 2> "base_url":"http://node1/solr/",
[junit4] 2> "node_name":"node1",
[junit4] 2> "core_node_name":"core_node1",
[junit4] 2> "core":"core1"}
[junit4] 2> 201541 INFO (zkCallback-279-thread-2) [ ]
o.a.s.c.c.ZkStateReader A cluster state change: WatchedEvent
state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred
- updating... (live nodes size: 1)
[junit4] 2> 201541 INFO (zkCallback-278-thread-1) [ ]
o.a.s.c.c.ZkStateReader A cluster state change: WatchedEvent
state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred
- updating... (live nodes size: 1)
[junit4] 2> 201542 INFO
(OverseerStateUpdate-94830728503164930-127.0.0.1:50527_solr-n_0000000001) [
] o.a.s.c.Overseer processMessage: workQueueSize: 2, message = {
[junit4] 2> "operation":"leader",
[junit4] 2> "shard":"shard1",
[junit4] 2> "collection":"collection1",
[junit4] 2> "base_url":"http://node1/solr/",
[junit4] 2> "core":"core1"}
[junit4] 2> 201542 INFO (zkCallback-279-thread-2) [ ]
o.a.s.c.c.ZkStateReader A cluster state change: WatchedEvent
state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred
- updating... (live nodes size: 1)
[junit4] 2> 201542 INFO
(OverseerCollectionConfigSetProcessor-94830728503164930-127.0.0.1:50527_solr-n_0000000001)
[ ] o.a.s.c.OverseerTaskProcessor Process current queue of overseer
operations
[junit4] 2> 201543 INFO (zkCallback-278-thread-1) [ ]
o.a.s.c.c.ZkStateReader A cluster state change: WatchedEvent
state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred
- updating... (live nodes size: 1)
[junit4] 2> 201551 INFO (zkCallback-278-thread-1) [ ]
o.a.s.c.c.ZkStateReader A live node change: WatchedEvent state:SyncConnected
type:NodeChildrenChanged path:/live_nodes, has occurred - updating... (live
nodes size: 1)
[junit4] 2> 201551 INFO (zkCallback-279-thread-2) [ ]
o.a.s.c.c.ZkStateReader A live node change: WatchedEvent state:SyncConnected
type:NodeChildrenChanged path:/live_nodes, has occurred - updating... (live
nodes size: 1)
{noformat}
This failed because there was a second leader election between the two asserts.
I have not been able to reproduce it myself, but Steve's jenkins is a very fast
machine.
Now, I am thinking of changing tack, and switching back to testing for STATE
message stats. The problem was initially that because there was a RECOVERING
STATE message passed in, the leader election sent in an extra ACTIVE message
(owing to SOLR-7989). However, if we send in ACTIVE instead of RECOVERING,
there won't be another STATE message sent in during leader election (even if
there is a second leader election). This, I think will fix the test. Attaching
the patch for this.
> OverseerTest failures
> ---------------------
>
> Key: SOLR-8249
> URL: https://issues.apache.org/jira/browse/SOLR-8249
> Project: Solr
> Issue Type: Bug
> Reporter: Noble Paul
> Assignee: Noble Paul
> Attachments: SOLR-8249.patch, SOLR-8249.patch, SOLR-8249.patch,
> SOLR-8249.patch
>
>
> http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/5377/
> This is related to SOLR-7989 . [~ichattopadhyaya] we need to fix the testcase
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]