Benno Evers created MESOS-9259:
----------------------------------

             Summary: CoordinatorTest.RecoveryRace is flaky
                 Key: MESOS-9259
                 URL: https://issues.apache.org/jira/browse/MESOS-9259
             Project: Mesos
          Issue Type: Bug
            Reporter: Benno Evers


Observed in an internal CI system:
{noformat}
../../src/tests/log_tests.cpp:1635
electing.get() is NONE
{noformat}

Full log:
{noformat}
[ RUN      ] CoordinatorTest.RecoveryRace
I0926 00:01:58.451109 11834 leveldb.cpp:174] Opened db in 6.714471ms
I0926 00:01:58.453207 11834 leveldb.cpp:181] Compacted db in 2.068898ms
I0926 00:01:58.453239 11834 leveldb.cpp:196] Created db iterator in 4546ns
I0926 00:01:58.453249 11834 leveldb.cpp:202] Seeked to beginning of db in 879ns
I0926 00:01:58.453258 11834 leveldb.cpp:277] Iterated through 0 keys in the db 
in 730ns
I0926 00:01:58.453272 11834 replica.cpp:795] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I0926 00:01:58.460438 11834 leveldb.cpp:174] Opened db in 7.111743ms
I0926 00:01:58.462488 11834 leveldb.cpp:181] Compacted db in 2.022322ms
I0926 00:01:58.462518 11834 leveldb.cpp:196] Created db iterator in 3391ns
I0926 00:01:58.462528 11834 leveldb.cpp:202] Seeked to beginning of db in 756ns
I0926 00:01:58.462536 11834 leveldb.cpp:277] Iterated through 0 keys in the db 
in 528ns
I0926 00:01:58.462548 11834 replica.cpp:795] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I0926 00:01:58.469419 11834 leveldb.cpp:174] Opened db in 6.820718ms
I0926 00:01:58.471472 11834 leveldb.cpp:181] Compacted db in 2.026713ms
I0926 00:01:58.471501 11834 leveldb.cpp:196] Created db iterator in 3217ns
I0926 00:01:58.471511 11834 leveldb.cpp:202] Seeked to beginning of db in 742ns
I0926 00:01:58.471518 11834 leveldb.cpp:277] Iterated through 0 keys in the db 
in 595ns
I0926 00:01:58.471530 11834 replica.cpp:795] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I0926 00:01:58.471700 11856 recover.cpp:437] Starting replica recovery
I0926 00:01:58.471700 11854 recover.cpp:437] Starting replica recovery
I0926 00:01:58.471757 11856 recover.cpp:468] Replica is in EMPTY status
I0926 00:01:58.471940 11860 replica.cpp:677] Replica in EMPTY status received a 
broadcasted recover request from __req_res__(398)@172.16.10.76:34795
I0926 00:01:58.471971 11860 replica.cpp:677] Replica in EMPTY status received a 
broadcasted recover request from __req_res__(400)@172.16.10.76:34795
I0926 00:01:58.471946 11856 replica.cpp:677] Replica in EMPTY status received a 
broadcasted recover request from __req_res__(399)@172.16.10.76:34795
I0926 00:01:58.472019 11859 recover.cpp:197] Received a recover response from a 
replica in EMPTY status
I0926 00:01:58.472048 11859 recover.cpp:197] Received a recover response from a 
replica in EMPTY status
I0926 00:01:58.472069 11859 recover.cpp:197] Received a recover response from a 
replica in EMPTY status
I0926 00:01:58.471757 11854 recover.cpp:468] Replica is in EMPTY status
I0926 00:01:58.472158 11858 recover.cpp:564] Updating replica status to STARTING
I0926 00:01:58.472293 11855 replica.cpp:677] Replica in EMPTY status received a 
broadcasted recover request from __req_res__(401)@172.16.10.76:34795
I0926 00:01:58.474081 11858 leveldb.cpp:310] Persisting metadata (8 bytes) to 
leveldb took 1.874172ms
I0926 00:01:58.474103 11858 replica.cpp:322] Persisted replica status to 
STARTING
I0926 00:01:58.476917 11854 recover.cpp:197] Received a recover response from a 
replica in EMPTY status
I0926 00:01:58.476929 11856 replica.cpp:677] Replica in EMPTY status received a 
broadcasted recover request from __req_res__(403)@172.16.10.76:34795
I0926 00:01:58.476965 11854 recover.cpp:197] Received a recover response from a 
replica in EMPTY status
I0926 00:01:58.476977 11856 recover.cpp:437] Starting replica recovery
I0926 00:01:58.477020 11854 recover.cpp:468] Replica is in EMPTY status
I0926 00:01:58.476917 11860 replica.cpp:677] Replica in STARTING status 
received a broadcasted recover request from __req_res__(402)@172.16.10.76:34795
I0926 00:01:58.477063 11854 recover.cpp:197] Received a recover response from a 
replica in STARTING status
I0926 00:01:58.477212 11854 replica.cpp:677] Replica in EMPTY status received a 
broadcasted recover request from __req_res__(404)@172.16.10.76:34795
I0926 00:01:58.477232 11857 recover.cpp:564] Updating replica status to STARTING
I0926 00:01:58.477238 11854 replica.cpp:677] Replica in EMPTY status received a 
broadcasted recover request from __req_res__(406)@172.16.10.76:34795
I0926 00:01:58.477277 11857 recover.cpp:197] Received a recover response from a 
replica in EMPTY status
I0926 00:01:58.477305 11857 recover.cpp:197] Received a recover response from a 
replica in EMPTY status
I0926 00:01:58.477217 11860 replica.cpp:677] Replica in STARTING status 
received a broadcasted recover request from __req_res__(405)@172.16.10.76:34795
I0926 00:01:58.478512 11856 recover.cpp:197] Received a recover response from a 
replica in STARTING status
I0926 00:01:58.478623 11858 recover.cpp:564] Updating replica status to STARTING
I0926 00:01:58.478802 11856 recover.cpp:468] Replica is in STARTING status
I0926 00:01:58.478969 11857 replica.cpp:677] Replica in STARTING status 
received a broadcasted recover request from __req_res__(408)@172.16.10.76:34795
I0926 00:01:58.479017 11854 leveldb.cpp:310] Persisting metadata (8 bytes) to 
leveldb took 1.738249ms
I0926 00:01:58.479037 11854 replica.cpp:322] Persisted replica status to 
STARTING
I0926 00:01:58.479039 11857 recover.cpp:197] Received a recover response from a 
replica in STARTING status
I0926 00:01:58.479053 11854 replica.cpp:677] Replica in STARTING status 
received a broadcasted recover request from __req_res__(407)@172.16.10.76:34795
I0926 00:01:58.479090 11854 recover.cpp:197] Received a recover response from a 
replica in STARTING status
I0926 00:01:58.480270 11858 leveldb.cpp:310] Persisting metadata (8 bytes) to 
leveldb took 1.597992ms
I0926 00:01:58.480294 11858 replica.cpp:322] Persisted replica status to 
STARTING
I0926 00:01:58.480311 11858 replica.cpp:677] Replica in STARTING status 
received a broadcasted recover request from __req_res__(409)@172.16.10.76:34795
I0926 00:01:58.480379 11859 recover.cpp:197] Received a recover response from a 
replica in STARTING status
I0926 00:01:58.480451 11860 recover.cpp:564] Updating replica status to VOTING
I0926 00:01:58.482100 11860 leveldb.cpp:310] Persisting metadata (8 bytes) to 
leveldb took 1.608489ms
I0926 00:01:58.482122 11860 replica.cpp:322] Persisted replica status to VOTING
I0926 00:01:58.484938 11857 recover.cpp:578] Successfully joined the Paxos group
I0926 00:01:58.492822 11857 recover.cpp:447] Recover process terminated
I0926 00:01:58.493100 11853 replica.cpp:379] Replica ignoring promise request 
from __req_res__(411)@172.16.10.76:34795 as it is in STARTING status
I0926 00:01:58.493111 11859 replica.cpp:379] Replica ignoring promise request 
from __req_res__(412)@172.16.10.76:34795 as it is in STARTING status
I0926 00:01:58.493155 11854 consensus.cpp:363] Aborting implicit promise 
request because 2 ignores received
I0926 00:01:58.493278 11854 recover.cpp:468] Replica is in STARTING status
I0926 00:01:58.493374 11854 replica.cpp:677] Replica in VOTING status received 
a broadcasted recover request from __req_res__(413)@172.16.10.76:34795
I0926 00:01:58.493397 11854 replica.cpp:677] Replica in STARTING status 
received a broadcasted recover request from __req_res__(414)@172.16.10.76:34795
I0926 00:01:58.493413 11854 replica.cpp:677] Replica in STARTING status 
received a broadcasted recover request from __req_res__(415)@172.16.10.76:34795
I0926 00:01:58.493443 11856 recover.cpp:197] Received a recover response from a 
replica in VOTING status
I0926 00:01:58.493474 11856 recover.cpp:197] Received a recover response from a 
replica in STARTING status
I0926 00:01:58.493487 11856 recover.cpp:197] Received a recover response from a 
replica in STARTING status
I0926 00:01:58.493535 11854 recover.cpp:564] Updating replica status to VOTING
I0926 00:01:58.493664 11854 recover.cpp:578] Successfully joined the Paxos group
I0926 00:01:58.493705 11854 recover.cpp:447] Recover process terminated
I0926 00:01:58.493849 11857 replica.cpp:497] Replica received implicit promise 
request from __req_res__(410)@172.16.10.76:34795 with proposal 1
I0926 00:01:58.493960 11854 replica.cpp:379] Replica ignoring promise request 
from __req_res__(418)@172.16.10.76:34795 as it is in STARTING status
I0926 00:01:58.495218 11857 leveldb.cpp:310] Persisting metadata (8 bytes) to 
leveldb took 1.347335ms
I0926 00:01:58.495218 11859 leveldb.cpp:310] Persisting metadata (8 bytes) to 
leveldb took 1.626386ms
I0926 00:01:58.495255 11859 replica.cpp:322] Persisted replica status to VOTING
I0926 00:01:58.495276 11859 replica.cpp:497] Replica received implicit promise 
request from __req_res__(417)@172.16.10.76:34795 with proposal 1
I0926 00:01:58.495244 11857 replica.cpp:344] Persisted promised to 1
I0926 00:01:58.495347 11857 replica.cpp:497] Replica received implicit promise 
request from __req_res__(416)@172.16.10.76:34795 with proposal 1
I0926 00:01:58.495374 11857 replica.cpp:502] Replica denying promise request 
with proposal 1
I0926 00:01:58.496754 11859 leveldb.cpp:310] Persisting metadata (8 bytes) to 
leveldb took 1.461595ms
I0926 00:01:58.496778 11859 replica.cpp:344] Persisted promised to 1
../../src/tests/log_tests.cpp:1635: Failure
electing.get() is NONE
I0926 00:01:58.500936 11853 recover.cpp:468] Replica is in STARTING status
I0926 00:01:58.501052 11853 replica.cpp:677] Replica in VOTING status received 
a broadcasted recover request from __req_res__(419)@172.16.10.76:34795
I0926 00:01:58.501060 11856 replica.cpp:677] Replica in VOTING status received 
a broadcasted recover request from __req_res__(420)@172.16.10.76:34795
I0926 00:01:58.501164 11858 replica.cpp:677] Replica in STARTING status 
received a broadcasted recover request from __req_res__(421)@172.16.10.76:34795
I0926 00:01:58.501180 11856 recover.cpp:197] Received a recover response from a 
replica in VOTING status
I0926 00:01:58.501204 11856 recover.cpp:197] Received a recover response from a 
replica in STARTING status
I0926 00:01:58.501224 11856 recover.cpp:197] Received a recover response from a 
replica in VOTING status
I0926 00:01:58.501315 11857 recover.cpp:564] Updating replica status to 
RECOVERING
I0926 00:01:58.502955 11857 leveldb.cpp:310] Persisting metadata (8 bytes) to 
leveldb took 1.590904ms
I0926 00:01:58.502980 11857 replica.cpp:322] Persisted replica status to 
RECOVERING
I0926 00:01:58.503018 11857 recover.cpp:543] Starting catch-up from position 0 
to 0
I0926 00:01:58.503204 11859 replica.cpp:391] Replica received explicit promise 
request from __req_res__(422)@172.16.10.76:34795 for position 0 with proposal 0
I0926 00:01:58.503213 11857 replica.cpp:391] Replica received explicit promise 
request from __req_res__(423)@172.16.10.76:34795 for position 0 with proposal 0
I0926 00:01:58.503254 11853 replica.cpp:379] Replica ignoring promise request 
from __req_res__(424)@172.16.10.76:34795 as it is in RECOVERING status
I0926 00:01:58.648479 11857 replica.cpp:391] Replica received explicit promise 
request from __req_res__(426)@172.16.10.76:34795 for position 0 with proposal 2
I0926 00:01:58.648479 11855 replica.cpp:379] Replica ignoring promise request 
from __req_res__(427)@172.16.10.76:34795 as it is in RECOVERING status
I0926 00:01:58.648536 11854 replica.cpp:391] Replica received explicit promise 
request from __req_res__(425)@172.16.10.76:34795 for position 0 with proposal 2
I0926 00:01:58.650311 11857 leveldb.cpp:347] Persisting action (8 bytes) to 
leveldb took 1.766643ms
I0926 00:01:58.650342 11857 replica.cpp:712] Persisted action NOP at position 0
I0926 00:01:58.650415 11854 leveldb.cpp:347] Persisting action (8 bytes) to 
leveldb took 1.795307ms
I0926 00:01:58.650430 11854 replica.cpp:712] Persisted action NOP at position 0
I0926 00:01:58.656956 11857 replica.cpp:541] Replica received write request for 
position 0 from __req_res__(429)@172.16.10.76:34795
I0926 00:01:58.656965 11853 replica.cpp:541] Replica received write request for 
position 0 from __req_res__(428)@172.16.10.76:34795
I0926 00:01:58.656991 11853 leveldb.cpp:460] Reading position from leveldb took 
9438ns
I0926 00:01:58.656991 11857 leveldb.cpp:460] Reading position from leveldb took 
12074ns
I0926 00:01:58.657078 11856 replica.cpp:529] Replica ignoring write request 
from __req_res__(430)@172.16.10.76:34795 as it is in RECOVERING status
I0926 00:01:58.658516 11857 leveldb.cpp:347] Persisting action (14 bytes) to 
leveldb took 1.493967ms
I0926 00:01:58.658538 11857 replica.cpp:712] Persisted action NOP at position 0
I0926 00:01:58.658660 11853 leveldb.cpp:347] Persisting action (14 bytes) to 
leveldb took 1.642791ms
I0926 00:01:58.658680 11853 replica.cpp:712] Persisted action NOP at position 0
I0926 00:01:58.664849 11857 replica.cpp:695] Replica received learned notice 
for position 0 from log-network(29)@172.16.10.76:34795
I0926 00:01:58.664852 11858 replica.cpp:695] Replica received learned notice 
for position 0 from log-network(29)@172.16.10.76:34795
I0926 00:01:58.664913 11860 replica.cpp:695] Replica received learned notice 
for position 0 from log-network(29)@172.16.10.76:34795
I0926 00:01:58.666600 11857 leveldb.cpp:347] Persisting action (16 bytes) to 
leveldb took 1.721702ms
I0926 00:01:58.666626 11857 replica.cpp:712] Persisted action NOP at position 0
I0926 00:01:58.666649 11858 leveldb.cpp:347] Persisting action (16 bytes) to 
leveldb took 1.765314ms
I0926 00:01:58.666662 11858 replica.cpp:712] Persisted action NOP at position 0
I0926 00:01:58.666679 11860 leveldb.cpp:347] Persisting action (16 bytes) to 
leveldb took 1.749498ms
I0926 00:01:58.666694 11860 replica.cpp:712] Persisted action NOP at position 0
I0926 00:01:58.672904 11858 recover.cpp:564] Updating replica status to VOTING
I0926 00:01:58.674584 11858 leveldb.cpp:310] Persisting metadata (8 bytes) to 
leveldb took 1.623281ms
I0926 00:01:58.674607 11858 replica.cpp:322] Persisted replica status to VOTING
I0926 00:01:58.674633 11858 recover.cpp:578] Successfully joined the Paxos group
I0926 00:01:58.674685 11858 recover.cpp:447] Recover process terminated
../../src/tests/log_tests.cpp:1648: Failure
appending.get() is NONE
[  FAILED  ] CoordinatorTest.RecoveryRace (231 ms)
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to