Yan Xu created MESOS-1250:
-----------------------------
Summary: ReservationAllocatorTest.ResourcesReturned flaky -
repeated_field.h:824 CHECK failed: (index) < (size())
Key: MESOS-1250
URL: https://issues.apache.org/jira/browse/MESOS-1250
Project: Mesos
Issue Type: Bug
Reporter: Yan Xu
{noformat:title=The CHECK failure}
[libprotobuf FATAL
../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src/google/protobuf/repeated_field.h:824]
CHECK failed: (index) < (size()):
{noformat}
{noformat:title=Full log}
[ RUN ] ReservationAllocatorTest.ResourcesReturned
I0425 18:24:35.375020 23940 leveldb.cpp:174] Opened db in 80.11386ms
I0425 18:24:35.385205 23940 leveldb.cpp:181] Compacted db in 10.16346ms
I0425 18:24:35.385226 23940 leveldb.cpp:196] Created db iterator in 3534ns
I0425 18:24:35.385233 23940 leveldb.cpp:202] Seeked to beginning of db in 513ns
I0425 18:24:35.385239 23940 leveldb.cpp:271] Iterated through 0 keys in the db
in 200ns
I0425 18:24:35.385251 23940 replica.cpp:729] Replica recovered with log
positions 0 -> 0 with 1 holes and 0 unlearned
I0425 18:24:35.385484 23963 recover.cpp:425] Starting replica recovery
I0425 18:24:35.385722 23961 recover.cpp:451] Replica is in EMPTY status
I0425 18:24:35.385973 23963 master.cpp:266] Master
20140425-182435-1032504131-35984-23940 (juno.apache.org) started on
67.195.138.61:35984
I0425 18:24:35.385993 23963 master.cpp:304] Master only allowing authenticated
frameworks to register
I0425 18:24:35.386001 23963 credentials.hpp:35] Loading credentials for
authentication
W0425 18:24:35.386051 23963 credentials.hpp:48] Failed to stat credentials file
'file:///tmp/ReservationAllocatorTest_ResourcesReturned_K72a9s/credentials': No
such file or directory
I0425 18:24:35.386209 23966 replica.cpp:626] Replica in EMPTY status received a
broadcasted recover request
I0425 18:24:35.386451 23964 recover.cpp:188] Received a recover response from a
replica in EMPTY status
I0425 18:24:35.386603 23966 hierarchical_allocator_process.hpp:302]
Initializing hierarchical allocator process with master :
[email protected]:35984
I0425 18:24:35.386672 23966 master.cpp:104] No whitelist given. Advertising
offers for all slaves
I0425 18:24:35.386672 23964 recover.cpp:542] Updating replica status to STARTING
I0425 18:24:35.387045 23968 master.cpp:918] The newly elected leader is
[email protected]:35984 with id 20140425-182435-1032504131-35984-23940
I0425 18:24:35.387058 23968 master.cpp:928] Elected as the leading master!
I0425 18:24:35.387064 23968 master.cpp:749] Recovering from registrar
I0425 18:24:35.387130 23968 registrar.cpp:275] Recovering registrar
I0425 18:24:35.416841 23964 leveldb.cpp:304] Persisting metadata (8 bytes) to
leveldb took 30.043801ms
I0425 18:24:35.416859 23964 replica.cpp:320] Persisted replica status to
STARTING
I0425 18:24:35.416947 23964 recover.cpp:451] Replica is in STARTING status
I0425 18:24:35.417343 23968 replica.cpp:626] Replica in STARTING status
received a broadcasted recover request
I0425 18:24:35.417513 23968 recover.cpp:188] Received a recover response from a
replica in STARTING status
I0425 18:24:35.417755 23967 recover.cpp:542] Updating replica status to VOTING
I0425 18:24:35.428218 23968 leveldb.cpp:304] Persisting metadata (8 bytes) to
leveldb took 10.386348ms
I0425 18:24:35.428267 23968 replica.cpp:320] Persisted replica status to VOTING
I0425 18:24:35.428336 23967 recover.cpp:556] Successfully joined the Paxos group
I0425 18:24:35.428447 23967 recover.cpp:440] Recover process terminated
I0425 18:24:35.428591 23967 log.cpp:656] Attempting to start the writer
I0425 18:24:35.429041 23966 replica.cpp:474] Replica received implicit promise
request with proposal 1
I0425 18:24:35.436518 23966 leveldb.cpp:304] Persisting metadata (8 bytes) to
leveldb took 7.457582ms
I0425 18:24:35.436539 23966 replica.cpp:342] Persisted promised to 1
I0425 18:24:35.436861 23964 coordinator.cpp:229] Coordinator attemping to fill
missing position
I0425 18:24:35.436869 23967 hierarchical_allocator_process.hpp:726] No
resources available to allocate!
I0425 18:24:35.436878 23967 hierarchical_allocator_process.hpp:688] Performed
allocation for 0 slaves in 15305ns
I0425 18:24:35.437417 23968 replica.cpp:375] Replica received explicit promise
request for position 0 with proposal 2
I0425 18:24:35.444845 23968 leveldb.cpp:341] Persisting action (8 bytes) to
leveldb took 7.408147ms
I0425 18:24:35.444866 23968 replica.cpp:664] Persisted action at 0
I0425 18:24:35.445366 23961 replica.cpp:508] Replica received write request for
position 0
I0425 18:24:35.445392 23961 leveldb.cpp:436] Reading position from leveldb took
12659ns
I0425 18:24:35.453176 23961 leveldb.cpp:341] Persisting action (14 bytes) to
leveldb took 7.767811ms
I0425 18:24:35.453198 23961 replica.cpp:664] Persisted action at 0
I0425 18:24:35.453548 23965 replica.cpp:643] Replica received learned notice
for position 0
I0425 18:24:35.461506 23965 leveldb.cpp:341] Persisting action (16 bytes) to
leveldb took 7.919252ms
I0425 18:24:35.461527 23965 replica.cpp:664] Persisted action at 0
I0425 18:24:35.461536 23965 replica.cpp:649] Replica learned NOP action at
position 0
I0425 18:24:35.461809 23966 log.cpp:672] Writer started with ending position 0
I0425 18:24:35.462334 23961 leveldb.cpp:436] Reading position from leveldb took
15347ns
I0425 18:24:35.464083 23967 registrar.cpp:308] Successfully recovered registrar
I0425 18:24:35.464109 23967 registrar.cpp:379] Attempting to update the
'registry'
I0425 18:24:35.465710 23962 log.cpp:680] Attempting to append 137 bytes to the
log
I0425 18:24:35.465852 23963 coordinator.cpp:339] Coordinator attempting to
write APPEND action at position 1
I0425 18:24:35.466189 23968 replica.cpp:508] Replica received write request for
position 1
I0425 18:24:35.475623 23968 leveldb.cpp:341] Persisting action (156 bytes) to
leveldb took 9.417683ms
I0425 18:24:35.475642 23968 replica.cpp:664] Persisted action at 1
I0425 18:24:35.475893 23968 replica.cpp:643] Replica received learned notice
for position 1
I0425 18:24:35.483952 23968 leveldb.cpp:341] Persisting action (158 bytes) to
leveldb took 8.039894ms
I0425 18:24:35.483976 23968 replica.cpp:664] Persisted action at 1
I0425 18:24:35.483985 23968 replica.cpp:649] Replica learned APPEND action at
position 1
I0425 18:24:35.484416 23965 registrar.cpp:427] Successfully updated 'registry'
I0425 18:24:35.484467 23968 log.cpp:699] Attempting to truncate the log to 1
I0425 18:24:35.484530 23962 coordinator.cpp:339] Coordinator attempting to
write TRUNCATE action at position 2
I0425 18:24:35.484622 23961 master.cpp:776] Recovered 0 slaves from the
Registry (99B) ; allowing 10mins for slaves to re-register
I0425 18:24:35.485131 23968 replica.cpp:508] Replica received write request for
position 2
I0425 18:24:35.487083 23965 hierarchical_allocator_process.hpp:726] No
resources available to allocate!
I0425 18:24:35.487120 23965 hierarchical_allocator_process.hpp:688] Performed
allocation for 0 slaves in 51286ns
I0425 18:24:35.487566 23963 slave.cpp:130] Slave started on
6)@67.195.138.61:35984
I0425 18:24:35.487763 23963 slave.cpp:218] Slave resources: cpus(role1):1;
mem(role1):200; cpus(role2):2; mem(role2):600; cpus(*):1; mem(*):200;
disk(*):0; ports(*):[31000-32000]
I0425 18:24:35.487838 23963 slave.cpp:246] Slave hostname: juno.apache.org
I0425 18:24:35.487845 23963 slave.cpp:247] Slave checkpoint: false
I0425 18:24:35.488287 23965 state.cpp:33] Recovering state from
'/tmp/ReservationAllocatorTest_ResourcesReturned_cRsGAx/meta'
I0425 18:24:35.488456 23963 status_update_manager.cpp:193] Recovering status
update manager
I0425 18:24:35.488628 23963 slave.cpp:2804] Finished recovery
I0425 18:24:35.488865 23963 slave.cpp:499] New master detected at
[email protected]:35984
I0425 18:24:35.488914 23963 slave.cpp:733] Will retry registration in
7.800343128secs if necessary
I0425 18:24:35.488931 23963 slave.cpp:524] Detecting new master
I0425 18:24:35.488968 23963 status_update_manager.cpp:167] New master detected
at [email protected]:35984
I0425 18:24:35.489169 23963 registrar.cpp:379] Attempting to update the
'registry'
I0425 18:24:35.494820 23968 leveldb.cpp:341] Persisting action (16 bytes) to
leveldb took 9.670136ms
I0425 18:24:35.494839 23968 replica.cpp:664] Persisted action at 2
I0425 18:24:35.495146 23962 replica.cpp:643] Replica received learned notice
for position 2
I0425 18:24:35.503154 23962 leveldb.cpp:341] Persisting action (18 bytes) to
leveldb took 7.987444ms
I0425 18:24:35.503190 23962 leveldb.cpp:399] Deleting ~1 keys from leveldb took
16160ns
I0425 18:24:35.503201 23962 replica.cpp:664] Persisted action at 2
I0425 18:24:35.503209 23962 replica.cpp:649] Replica learned TRUNCATE action at
position 2
I0425 18:24:35.503527 23961 log.cpp:680] Attempting to append 438 bytes to the
log
I0425 18:24:35.503597 23968 coordinator.cpp:339] Coordinator attempting to
write APPEND action at position 3
I0425 18:24:35.503970 23967 replica.cpp:508] Replica received write request for
position 3
I0425 18:24:35.511482 23967 leveldb.cpp:341] Persisting action (457 bytes) to
leveldb took 7.473886ms
I0425 18:24:35.511502 23967 replica.cpp:664] Persisted action at 3
I0425 18:24:35.511749 23968 replica.cpp:643] Replica received learned notice
for position 3
I0425 18:24:35.519814 23968 leveldb.cpp:341] Persisting action (459 bytes) to
leveldb took 8.043537ms
I0425 18:24:35.519836 23968 replica.cpp:664] Persisted action at 3
I0425 18:24:35.519845 23968 replica.cpp:649] Replica learned APPEND action at
position 3
I0425 18:24:35.520158 23965 registrar.cpp:427] Successfully updated 'registry'
I0425 18:24:35.520236 23964 log.cpp:699] Attempting to truncate the log to 3
I0425 18:24:35.520279 23965 master.cpp:2139] Admitted slave on juno.apache.org
at slave(6)@67.195.138.61:35984
I0425 18:24:35.520285 23964 coordinator.cpp:339] Coordinator attempting to
write TRUNCATE action at position 4
I0425 18:24:35.520306 23965 master.cpp:3213] Adding slave
20140425-182435-1032504131-35984-23940-0 at juno.apache.org with cpus(role1):1;
mem(role1):200; cpus(role2):2; mem(role2):600; cpus(*):1; mem(*):200;
disk(*):0; ports(*):[31000-32000]
I0425 18:24:35.520434 23967 slave.cpp:542] Registered with master
[email protected]:35984; given slave ID
20140425-182435-1032504131-35984-23940-0
[libprotobuf FATAL
../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src/google/protobuf/repeated_field.h:824]
CHECK failed: (index) < (size()):
I0425 18:24:35.520853 23964 replica.cpp:508] Replica received write request for
position 4
I0425 18:24:35.520884 23963 hierarchical_allocator_process.hpp:445] Added slave
20140425-182435-1032504131-35984-23940-0 (juno.apache.org) with cpus(role1):1;
mem(role1):200; cpus(role2):2; mem(role2):600; cpus(*):1; mem(*):200;
disk(*):0; ports(*):[31000-32000] (and cpus(role1):1; mem(role1):200;
cpus(role2):2; mem(role2):600; cpus(*):1; mem(*):200; disk(*):0;
ports(*):[31000-32000] available)
I0425 18:24:35.520987 23963 hierarchical_allocator_process.hpp:708] Performed
allocation for slave 20140425-182435-1032504131-35984-23940-0 in 17123ns
../../src/tests/allocator_tests.cpp:523: Failure
Actual function call count doesn't match EXPECT_CALL(allocator, slaveAdded(_,
_, _))...
Expected: to be called twice
Actual: called once - unsatisfied and active
unknown file: Failure
C++ exception with description "CHECK failed: (index) < (size()): " thrown in
the test body.
I0425 18:24:35.521409 23940 master.cpp:547] Master terminating
I0425 18:24:35.521536 23965 slave.cpp:2012] [email protected]:35984 exited
W0425 18:24:35.521553 23965 slave.cpp:2015] Master disconnected! Waiting for a
new master to be elected
I0425 18:24:35.528180 23964 leveldb.cpp:341] Persisting action (16 bytes) to
leveldb took 7.286467ms
I0425 18:24:35.528199 23964 replica.cpp:664] Persisted action at 4
I0425 18:24:35.528659 23940 slave.cpp:388] Slave terminating
[ FAILED ] ReservationAllocatorTest.ResourcesReturned (235 ms)
{noformat}
{noformat:title=This is only the result of the CHECK failure and irrelevant}
Actual function call count doesn't match EXPECT_CALL(allocator, slaveAdded(_,
_, _))...
Expected: to be called twice
Actual: called once - unsatisfied and active
{noformat}
--
This message was sent by Atlassian JIRA
(v6.2#6252)