Yan Xu created MESOS-1437:
-----------------------------
Summary: SlaveRecoveryTest/0.RestartBeforeContainerizerLaunch is
flaky
Key: MESOS-1437
URL: https://issues.apache.org/jira/browse/MESOS-1437
Project: Mesos
Issue Type: Bug
Components: test
Affects Versions: 0.19.0
Environment: ubuntu-14.04-gcc
Reporter: Yan Xu
{noformat}
[ RUN ] SlaveRecoveryTest/0.RestartBeforeContainerizerLaunch
Using temporary directory
'/tmp/SlaveRecoveryTest_0_RestartBeforeContainerizerLaunch_lUbOAD'
I0529 09:29:24.491727 18175 leveldb.cpp:176] Opened db in 3.742412ms
I0529 09:29:24.493413 18175 leveldb.cpp:183] Compacted db in 1.183942ms
I0529 09:29:24.493865 18175 leveldb.cpp:198] Created db iterator in 6242ns
I0529 09:29:24.494138 18175 leveldb.cpp:204] Seeked to beginning of db in 1213ns
I0529 09:29:24.494526 18175 leveldb.cpp:273] Iterated through 0 keys in the db
in 638ns
I0529 09:29:24.494839 18175 replica.cpp:741] Replica recovered with log
positions 0 -> 0 with 1 holes and 0 unlearned
I0529 09:29:24.495407 18197 recover.cpp:425] Starting replica recovery
I0529 09:29:24.495491 18197 recover.cpp:451] Replica is in EMPTY status
I0529 09:29:24.495844 18197 replica.cpp:638] Replica in EMPTY status received a
broadcasted recover request
I0529 09:29:24.495916 18197 recover.cpp:188] Received a recover response from a
replica in EMPTY status
I0529 09:29:24.496022 18197 recover.cpp:542] Updating replica status to STARTING
I0529 09:29:24.497161 18194 leveldb.cpp:306] Persisting metadata (8 bytes) to
leveldb took 844433ns
I0529 09:29:24.497185 18194 replica.cpp:320] Persisted replica status to
STARTING
I0529 09:29:24.497251 18194 recover.cpp:451] Replica is in STARTING status
I0529 09:29:24.497478 18194 replica.cpp:638] Replica in STARTING status
received a broadcasted recover request
I0529 09:29:24.497534 18194 recover.cpp:188] Received a recover response from a
replica in STARTING status
I0529 09:29:24.497622 18194 recover.cpp:542] Updating replica status to VOTING
I0529 09:29:24.498047 18194 leveldb.cpp:306] Persisting metadata (8 bytes) to
leveldb took 376402ns
I0529 09:29:24.498067 18194 replica.cpp:320] Persisted replica status to VOTING
I0529 09:29:24.498108 18194 recover.cpp:556] Successfully joined the Paxos group
I0529 09:29:24.498159 18194 recover.cpp:440] Recover process terminated
I0529 09:29:24.503008 18196 master.cpp:272] Master
20140529-092924-16842879-48472-18175 (trusty) started on 127.0.1.1:48472
I0529 09:29:24.503432 18196 master.cpp:309] Master only allowing authenticated
frameworks to register
I0529 09:29:24.503765 18196 master.cpp:314] Master only allowing authenticated
slaves to register
I0529 09:29:24.504178 18196 credentials.hpp:35] Loading credentials for
authentication from
'/tmp/SlaveRecoveryTest_0_RestartBeforeContainerizerLaunch_lUbOAD/credentials'
I0529 09:29:24.504432 18196 master.cpp:340] Master enabling authorization
I0529 09:29:24.504879 18200 hierarchical_allocator_process.hpp:301]
Initializing hierarchical allocator process with master : [email protected]:48472
I0529 09:29:24.505163 18198 master.cpp:108] No whitelist given. Advertising
offers for all slaves
I0529 09:29:24.506216 18196 master.cpp:957] The newly elected leader is
[email protected]:48472 with id 20140529-092924-16842879-48472-18175
I0529 09:29:24.506657 18196 master.cpp:970] Elected as the leading master!
I0529 09:29:24.506963 18196 master.cpp:788] Recovering from registrar
I0529 09:29:24.507419 18201 registrar.cpp:313] Recovering registrar
I0529 09:29:24.507659 18194 log.cpp:656] Attempting to start the writer
I0529 09:29:24.507969 18194 replica.cpp:474] Replica received implicit promise
request with proposal 1
I0529 09:29:24.508533 18194 leveldb.cpp:306] Persisting metadata (8 bytes) to
leveldb took 540486ns
I0529 09:29:24.508553 18194 replica.cpp:342] Persisted promised to 1
I0529 09:29:24.508718 18194 coordinator.cpp:230] Coordinator attemping to fill
missing position
I0529 09:29:24.509006 18194 replica.cpp:375] Replica received explicit promise
request for position 0 with proposal 2
I0529 09:29:24.509305 18194 leveldb.cpp:343] Persisting action (8 bytes) to
leveldb took 278308ns
I0529 09:29:24.509552 18194 replica.cpp:676] Persisted action at 0
I0529 09:29:24.510025 18194 replica.cpp:508] Replica received write request for
position 0
I0529 09:29:24.510303 18194 leveldb.cpp:438] Reading position from leveldb took
16831ns
I0529 09:29:24.510577 18194 leveldb.cpp:343] Persisting action (14 bytes) to
leveldb took 252132ns
I0529 09:29:24.510615 18194 replica.cpp:676] Persisted action at 0
I0529 09:29:24.514835 18197 replica.cpp:655] Replica received learned notice
for position 0
I0529 09:29:24.515504 18197 leveldb.cpp:343] Persisting action (16 bytes) to
leveldb took 639727ns
I0529 09:29:24.515524 18197 replica.cpp:676] Persisted action at 0
I0529 09:29:24.515534 18197 replica.cpp:661] Replica learned NOP action at
position 0
I0529 09:29:24.515691 18197 log.cpp:672] Writer started with ending position 0
I0529 09:29:24.516027 18198 leveldb.cpp:438] Reading position from leveldb took
11426ns
I0529 09:29:24.517470 18198 registrar.cpp:346] Successfully fetched the
registry (0B)
I0529 09:29:24.517500 18198 registrar.cpp:422] Attempting to update the
'registry'
I0529 09:29:24.519122 18199 log.cpp:680] Attempting to append 119 bytes to the
log
I0529 09:29:24.519177 18199 coordinator.cpp:340] Coordinator attempting to
write APPEND action at position 1
I0529 09:29:24.519358 18199 replica.cpp:508] Replica received write request for
position 1
I0529 09:29:24.520508 18199 leveldb.cpp:343] Persisting action (136 bytes) to
leveldb took 1.12638ms
I0529 09:29:24.520544 18199 replica.cpp:676] Persisted action at 1
I0529 09:29:24.520712 18199 replica.cpp:655] Replica received learned notice
for position 1
I0529 09:29:24.521198 18199 leveldb.cpp:343] Persisting action (138 bytes) to
leveldb took 464062ns
I0529 09:29:24.521219 18199 replica.cpp:676] Persisted action at 1
I0529 09:29:24.521229 18199 replica.cpp:661] Replica learned APPEND action at
position 1
I0529 09:29:24.521467 18199 registrar.cpp:479] Successfully updated 'registry'
I0529 09:29:24.521507 18199 registrar.cpp:372] Successfully recovered registrar
I0529 09:29:24.521543 18199 log.cpp:699] Attempting to truncate the log to 1
I0529 09:29:24.521605 18199 master.cpp:815] Recovered 0 slaves from the
Registry (83B) ; allowing 10mins for slaves to re-register
I0529 09:29:24.521646 18199 coordinator.cpp:340] Coordinator attempting to
write TRUNCATE action at position 2
I0529 09:29:24.521832 18199 replica.cpp:508] Replica received write request for
position 2
I0529 09:29:24.522260 18199 leveldb.cpp:343] Persisting action (16 bytes) to
leveldb took 405954ns
I0529 09:29:24.522593 18199 replica.cpp:676] Persisted action at 2
I0529 09:29:24.522948 18199 replica.cpp:655] Replica received learned notice
for position 2
I0529 09:29:24.524375 18199 leveldb.cpp:343] Persisting action (18 bytes) to
leveldb took 1.142005ms
I0529 09:29:24.524786 18199 leveldb.cpp:401] Deleting ~1 keys from leveldb took
24473ns
I0529 09:29:24.525178 18199 replica.cpp:676] Persisted action at 2
I0529 09:29:24.525440 18199 replica.cpp:661] Replica learned TRUNCATE action at
position 2
I0529 09:29:24.528745 18175 sched.cpp:126] Version: 0.19.0
I0529 09:29:24.529137 18199 slave.cpp:143] Slave started on 136)@127.0.1.1:48472
I0529 09:29:24.529464 18199 credentials.hpp:35] Loading credentials for
authentication from
'/tmp/SlaveRecoveryTest_0_RestartBeforeContainerizerLaunch_q4MPqy/credential'
I0529 09:29:24.529770 18199 slave.cpp:242] Slave using credential for:
test-principal
I0529 09:29:24.529885 18199 slave.cpp:255] Slave resources: cpus(*):2;
mem(*):1024; disk(*):1024; ports(*):[31000-32000]
I0529 09:29:24.529955 18199 slave.cpp:283] Slave hostname: trusty
I0529 09:29:24.529968 18199 slave.cpp:284] Slave checkpoint: true
I0529 09:29:24.530727 18199 state.cpp:33] Recovering state from
'/tmp/SlaveRecoveryTest_0_RestartBeforeContainerizerLaunch_q4MPqy/meta'
I0529 09:29:24.530822 18199 status_update_manager.cpp:193] Recovering status
update manager
I0529 09:29:24.530894 18199 slave.cpp:3018] Finished recovery
I0529 09:29:24.531134 18199 slave.cpp:536] New master detected at
[email protected]:48472
I0529 09:29:24.531162 18199 slave.cpp:612] Authenticating with master
[email protected]:48472
I0529 09:29:24.531199 18199 slave.cpp:585] Detecting new master
I0529 09:29:24.531229 18199 status_update_manager.cpp:167] New master detected
at [email protected]:48472
I0529 09:29:24.531265 18199 authenticatee.hpp:128] Creating new client SASL
connection
I0529 09:29:24.531412 18199 master.cpp:2986] Authenticating
slave(136)@127.0.1.1:48472
I0529 09:29:24.531489 18199 authenticator.hpp:156] Creating new server SASL
connection
I0529 09:29:24.531599 18199 authenticatee.hpp:219] Received SASL authentication
mechanisms: CRAM-MD5
I0529 09:29:24.531620 18199 authenticatee.hpp:245] Attempting to authenticate
with mechanism 'CRAM-MD5'
I0529 09:29:24.531644 18199 authenticator.hpp:262] Received SASL authentication
start
I0529 09:29:24.531680 18199 authenticator.hpp:384] Authentication requires more
steps
I0529 09:29:24.531707 18199 authenticatee.hpp:265] Received SASL authentication
step
I0529 09:29:24.531741 18199 authenticator.hpp:290] Received SASL authentication
step
I0529 09:29:24.531759 18199 auxprop.cpp:81] Request to lookup properties for
user: 'test-principal' realm: 'trusty' server FQDN: 'trusty'
SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false
SASL_AUXPROP_AUTHZID: false
I0529 09:29:24.531767 18199 auxprop.cpp:153] Looking up auxiliary property
'*userPassword'
I0529 09:29:24.531780 18199 auxprop.cpp:153] Looking up auxiliary property
'*cmusaslsecretCRAM-MD5'
I0529 09:29:24.531790 18199 auxprop.cpp:81] Request to lookup properties for
user: 'test-principal' realm: 'trusty' server FQDN: 'trusty'
SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false
SASL_AUXPROP_AUTHZID: true
I0529 09:29:24.531797 18199 auxprop.cpp:103] Skipping auxiliary property
'*userPassword' since SASL_AUXPROP_AUTHZID == true
I0529 09:29:24.531803 18199 auxprop.cpp:103] Skipping auxiliary property
'*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
I0529 09:29:24.531827 18199 authenticator.hpp:376] Authentication success
I0529 09:29:24.531857 18199 authenticatee.hpp:305] Authentication success
I0529 09:29:24.531882 18199 master.cpp:3026] Successfully authenticated
principal 'test-principal' at slave(136)@127.0.1.1:48472
I0529 09:29:24.531945 18199 slave.cpp:669] Successfully authenticated with
master [email protected]:48472
I0529 09:29:24.531986 18199 slave.cpp:902] Will retry registration in
3.076227ms if necessary
I0529 09:29:24.532193 18196 master.cpp:2302] Registering slave at
slave(136)@127.0.1.1:48472 (trusty) with id
20140529-092924-16842879-48472-18175-0
I0529 09:29:24.532804 18197 registrar.cpp:422] Attempting to update the
'registry'
I0529 09:29:24.533882 18201 log.cpp:680] Attempting to append 292 bytes to the
log
I0529 09:29:24.533936 18201 coordinator.cpp:340] Coordinator attempting to
write APPEND action at position 3
I0529 09:29:24.534139 18201 replica.cpp:508] Replica received write request for
position 3
I0529 09:29:24.534694 18201 leveldb.cpp:343] Persisting action (311 bytes) to
leveldb took 532639ns
I0529 09:29:24.534715 18201 replica.cpp:676] Persisted action at 3
I0529 09:29:24.535266 18194 replica.cpp:655] Replica received learned notice
for position 3
I0529 09:29:24.536165 18196 sched.cpp:222] New master detected at
[email protected]:48472
I0529 09:29:24.536190 18196 sched.cpp:273] Authenticating with master
[email protected]:48472
I0529 09:29:24.536586 18194 leveldb.cpp:343] Persisting action (313 bytes) to
leveldb took 1.293153ms
I0529 09:29:24.536613 18194 replica.cpp:676] Persisted action at 3
I0529 09:29:24.536623 18194 replica.cpp:661] Replica learned APPEND action at
position 3
I0529 09:29:24.536893 18194 registrar.cpp:479] Successfully updated 'registry'
I0529 09:29:24.536964 18194 log.cpp:699] Attempting to truncate the log to 3
I0529 09:29:24.537014 18194 master.cpp:2342] Registered slave
20140529-092924-16842879-48472-18175-0 at slave(136)@127.0.1.1:48472 (trusty)
I0529 09:29:24.537031 18194 master.cpp:3472] Adding slave
20140529-092924-16842879-48472-18175-0 at slave(136)@127.0.1.1:48472 (trusty)
with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000]
I0529 09:29:24.537122 18194 coordinator.cpp:340] Coordinator attempting to
write TRUNCATE action at position 4
I0529 09:29:24.537215 18194 hierarchical_allocator_process.hpp:444] Added slave
20140529-092924-16842879-48472-18175-0 (trusty) with cpus(*):2; mem(*):1024;
disk(*):1024; ports(*):[31000-32000] (and cpus(*):2; mem(*):1024; disk(*):1024;
ports(*):[31000-32000] available)
I0529 09:29:24.537257 18194 hierarchical_allocator_process.hpp:707] Performed
allocation for slave 20140529-092924-16842879-48472-18175-0 in 9613ns
I0529 09:29:24.537389 18194 replica.cpp:508] Replica received write request for
position 4
I0529 09:29:24.536259 18196 authenticatee.hpp:128] Creating new client SASL
connection
I0529 09:29:24.538015 18196 master.cpp:2986] Authenticating
scheduler(130)@127.0.1.1:48472
I0529 09:29:24.538091 18196 authenticator.hpp:156] Creating new server SASL
connection
I0529 09:29:24.538194 18196 authenticatee.hpp:219] Received SASL authentication
mechanisms: CRAM-MD5
I0529 09:29:24.538214 18196 authenticatee.hpp:245] Attempting to authenticate
with mechanism 'CRAM-MD5'
I0529 09:29:24.538238 18196 authenticator.hpp:262] Received SASL authentication
start
I0529 09:29:24.538272 18196 authenticator.hpp:384] Authentication requires more
steps
I0529 09:29:24.538297 18196 authenticatee.hpp:265] Received SASL authentication
step
I0529 09:29:24.538331 18196 authenticator.hpp:290] Received SASL authentication
step
I0529 09:29:24.538346 18196 auxprop.cpp:81] Request to lookup properties for
user: 'test-principal' realm: 'trusty' server FQDN: 'trusty'
SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false
SASL_AUXPROP_AUTHZID: false
I0529 09:29:24.538354 18196 auxprop.cpp:153] Looking up auxiliary property
'*userPassword'
I0529 09:29:24.538365 18196 auxprop.cpp:153] Looking up auxiliary property
'*cmusaslsecretCRAM-MD5'
I0529 09:29:24.538374 18196 auxprop.cpp:81] Request to lookup properties for
user: 'test-principal' realm: 'trusty' server FQDN: 'trusty'
SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false
SASL_AUXPROP_AUTHZID: true
I0529 09:29:24.538383 18196 auxprop.cpp:103] Skipping auxiliary property
'*userPassword' since SASL_AUXPROP_AUTHZID == true
I0529 09:29:24.538388 18196 auxprop.cpp:103] Skipping auxiliary property
'*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
I0529 09:29:24.538399 18196 authenticator.hpp:376] Authentication success
I0529 09:29:24.538424 18196 authenticatee.hpp:305] Authentication success
I0529 09:29:24.538570 18195 sched.cpp:347] Successfully authenticated with
master [email protected]:48472
I0529 09:29:24.538591 18195 sched.cpp:466] Sending registration request to
[email protected]:48472
I0529 09:29:24.538947 18201 slave.cpp:902] Will retry registration in
14.852555ms if necessary
I0529 09:29:24.539002 18201 slave.cpp:703] Registered with master
[email protected]:48472; given slave ID 20140529-092924-16842879-48472-18175-0
I0529 09:29:24.539147 18201 slave.cpp:716] Checkpointing SlaveInfo to
'/tmp/SlaveRecoveryTest_0_RestartBeforeContainerizerLaunch_q4MPqy/meta/slaves/20140529-092924-16842879-48472-18175-0/slave.info'
I0529 09:29:24.539445 18194 leveldb.cpp:343] Persisting action (16 bytes) to
leveldb took 2.035571ms
I0529 09:29:24.539471 18194 replica.cpp:676] Persisted action at 4
I0529 09:29:24.539649 18194 replica.cpp:655] Replica received learned notice
for position 4
I0529 09:29:24.539990 18196 master.cpp:3026] Successfully authenticated
principal 'test-principal' at scheduler(130)@127.0.1.1:48472
I0529 09:29:24.540099 18196 master.cpp:1041] Received registration request from
scheduler(130)@127.0.1.1:48472
I0529 09:29:24.540127 18196 master.cpp:1059] Registering framework
20140529-092924-16842879-48472-18175-0000 at scheduler(130)@127.0.1.1:48472
I0529 09:29:24.540189 18196 master.cpp:2278] Slave
20140529-092924-16842879-48472-18175-0 at slave(136)@127.0.1.1:48472 (trusty)
already registered, resending acknowledgement
I0529 09:29:24.540230 18196 sched.cpp:397] Framework registered with
20140529-092924-16842879-48472-18175-0000
I0529 09:29:24.540249 18196 sched.cpp:411] Scheduler::registered took 9083ns
I0529 09:29:24.540282 18196 hierarchical_allocator_process.hpp:331] Added
framework 20140529-092924-16842879-48472-18175-0000
I0529 09:29:24.540318 18196 hierarchical_allocator_process.hpp:751] Offering
cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave
20140529-092924-16842879-48472-18175-0 to framework
20140529-092924-16842879-48472-18175-0000
I0529 09:29:24.540398 18196 hierarchical_allocator_process.hpp:687] Performed
allocation for 1 slaves in 103378ns
W0529 09:29:24.540426 18196 slave.cpp:728] Already registered with master
[email protected]:48472
I0529 09:29:24.540478 18196 master.hpp:683] Adding offer
20140529-092924-16842879-48472-18175-0 with resources cpus(*):2; mem(*):1024;
disk(*):1024; ports(*):[31000-32000] on slave
20140529-092924-16842879-48472-18175-0 (trusty)
I0529 09:29:24.540520 18196 master.cpp:2933] Sending 1 offers to framework
20140529-092924-16842879-48472-18175-0000
I0529 09:29:24.540613 18196 sched.cpp:534] Scheduler::resourceOffers took
13742ns
I0529 09:29:24.541463 18194 leveldb.cpp:343] Persisting action (18 bytes) to
leveldb took 1.792617ms
I0529 09:29:24.541507 18194 leveldb.cpp:401] Deleting ~2 keys from leveldb took
19301ns
I0529 09:29:24.541520 18194 replica.cpp:676] Persisted action at 4
I0529 09:29:24.541529 18194 replica.cpp:661] Replica learned TRUNCATE action at
position 4
I0529 09:29:24.542551 18196 master.hpp:693] Removing offer
20140529-092924-16842879-48472-18175-0 with resources cpus(*):2; mem(*):1024;
disk(*):1024; ports(*):[31000-32000] on slave
20140529-092924-16842879-48472-18175-0 (trusty)
I0529 09:29:24.542899 18196 master.cpp:1889] Processing reply for offers: [
20140529-092924-16842879-48472-18175-0 ] on slave
20140529-092924-16842879-48472-18175-0 at slave(136)@127.0.1.1:48472 (trusty)
for framework 20140529-092924-16842879-48472-18175-0000
I0529 09:29:24.543246 18196 master.hpp:655] Adding task
24a52e8f-0b30-4970-8829-a7383f19980e with resources cpus(*):2; mem(*):1024;
disk(*):1024; ports(*):[31000-32000] on slave
20140529-092924-16842879-48472-18175-0 (trusty)
I0529 09:29:24.543572 18196 master.cpp:3111] Launching task
24a52e8f-0b30-4970-8829-a7383f19980e of framework
20140529-092924-16842879-48472-18175-0000 with resources cpus(*):2;
mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave
20140529-092924-16842879-48472-18175-0 at slave(136)@127.0.1.1:48472 (trusty)
I0529 09:29:24.544034 18195 slave.cpp:933] Got assigned task
24a52e8f-0b30-4970-8829-a7383f19980e for framework
20140529-092924-16842879-48472-18175-0000
I0529 09:29:24.544123 18195 slave.cpp:3290] Checkpointing FrameworkInfo to
'/tmp/SlaveRecoveryTest_0_RestartBeforeContainerizerLaunch_q4MPqy/meta/slaves/20140529-092924-16842879-48472-18175-0/frameworks/20140529-092924-16842879-48472-18175-0000/framework.info'
I0529 09:29:24.544325 18195 slave.cpp:3297] Checkpointing framework pid
'scheduler(130)@127.0.1.1:48472' to
'/tmp/SlaveRecoveryTest_0_RestartBeforeContainerizerLaunch_q4MPqy/meta/slaves/20140529-092924-16842879-48472-18175-0/frameworks/20140529-092924-16842879-48472-18175-0000/framework.pid'
I0529 09:29:24.544543 18195 slave.cpp:1043] Launching task
24a52e8f-0b30-4970-8829-a7383f19980e for framework
20140529-092924-16842879-48472-18175-0000
I0529 09:29:24.545670 18195 slave.cpp:3612] Checkpointing ExecutorInfo to
'/tmp/SlaveRecoveryTest_0_RestartBeforeContainerizerLaunch_q4MPqy/meta/slaves/20140529-092924-16842879-48472-18175-0/frameworks/20140529-092924-16842879-48472-18175-0000/executors/24a52e8f-0b30-4970-8829-a7383f19980e/executor.info'
I0529 09:29:24.546325 18195 slave.cpp:3727] Checkpointing TaskInfo to
'/tmp/SlaveRecoveryTest_0_RestartBeforeContainerizerLaunch_q4MPqy/meta/slaves/20140529-092924-16842879-48472-18175-0/frameworks/20140529-092924-16842879-48472-18175-0000/executors/24a52e8f-0b30-4970-8829-a7383f19980e/runs/7fbaa2b9-d41e-4d3e-897f-3d593b1308c7/tasks/24a52e8f-0b30-4970-8829-a7383f19980e/task.info'
I0529 09:29:24.546923 18195 slave.cpp:1153] Queuing task
'24a52e8f-0b30-4970-8829-a7383f19980e' for executor
24a52e8f-0b30-4970-8829-a7383f19980e of framework
'20140529-092924-16842879-48472-18175-0000
E0529 09:29:24.547292 18195 slave.cpp:2397] Termination of executor
'24a52e8f-0b30-4970-8829-a7383f19980e' of framework
'20140529-092924-16842879-48472-18175-0000' failed: Unknown container:
7fbaa2b9-d41e-4d3e-897f-3d593b1308c7
E0529 09:29:24.547698 18195 slave.cpp:2686] Failed to unmonitor container for
executor 24a52e8f-0b30-4970-8829-a7383f19980e of framework
20140529-092924-16842879-48472-18175-0000: Not monitored
I0529 09:29:24.548846 18195 slave.cpp:2018] Handling status update TASK_FAILED
(UUID: aee0620e-c349-42ce-8cce-7b3835adad95) for task
24a52e8f-0b30-4970-8829-a7383f19980e of framework
20140529-092924-16842879-48472-18175-0000 from @0.0.0.0:0
I0529 09:29:24.549253 18195 slave.cpp:3660] Terminating task
24a52e8f-0b30-4970-8829-a7383f19980e
I0529 09:29:24.549710 18194 status_update_manager.cpp:320] Received status
update TASK_FAILED (UUID: aee0620e-c349-42ce-8cce-7b3835adad95) for task
24a52e8f-0b30-4970-8829-a7383f19980e of framework
20140529-092924-16842879-48472-18175-0000
I0529 09:29:24.549733 18194 status_update_manager.cpp:499] Creating
StatusUpdate stream for task 24a52e8f-0b30-4970-8829-a7383f19980e of framework
20140529-092924-16842879-48472-18175-0000
I0529 09:29:24.549865 18194 status_update_manager.hpp:342] Checkpointing UPDATE
for status update TASK_FAILED (UUID: aee0620e-c349-42ce-8cce-7b3835adad95) for
task 24a52e8f-0b30-4970-8829-a7383f19980e of framework
20140529-092924-16842879-48472-18175-0000
I0529 09:29:24.550242 18195 slave.cpp:425] Slave terminating
I0529 09:29:24.550930 18198 master.cpp:712] Slave
20140529-092924-16842879-48472-18175-0 at slave(136)@127.0.1.1:48472 (trusty)
disconnected
I0529 09:29:24.550952 18198 master.cpp:1344] Disconnecting slave
20140529-092924-16842879-48472-18175-0
I0529 09:29:24.551013 18194 status_update_manager.cpp:373] Forwarding status
update TASK_FAILED (UUID: aee0620e-c349-42ce-8cce-7b3835adad95) for task
24a52e8f-0b30-4970-8829-a7383f19980e of framework
20140529-092924-16842879-48472-18175-0000 to [email protected]:48472
I0529 09:29:24.551261 18198 hierarchical_allocator_process.hpp:483] Slave
20140529-092924-16842879-48472-18175-0 disconnected
I0529 09:29:24.551419 18194 master.cpp:2628] Status update TASK_FAILED (UUID:
aee0620e-c349-42ce-8cce-7b3835adad95) for task
24a52e8f-0b30-4970-8829-a7383f19980e of framework
20140529-092924-16842879-48472-18175-0000 from slave
20140529-092924-16842879-48472-18175-0 at slave(136)@127.0.1.1:48472 (trusty)
I0529 09:29:24.551475 18194 master.hpp:673] Removing task
24a52e8f-0b30-4970-8829-a7383f19980e with resources cpus(*):2; mem(*):1024;
disk(*):1024; ports(*):[31000-32000] on slave
20140529-092924-16842879-48472-18175-0 (trusty)
GMOCK WARNING:
Uninteresting mock function call - returning directly.
Function call: statusUpdate(0x7fff69ebf410, @0x2ad9c80de320 72-byte object
<30-17 18-A5 D9-2A 00-00 00-00 00-00 00-00 00-00 D0-6B 0E-C8 D9-2A 00-00 50-4A
0E-C8 D9-2A 00-00 38-30 D1-00 00-00 00-00 90-F1 0D-C8 D9-2A 00-00 1B-13 63-6B
13-EB D4-41 03-00 00-00 00-00 00-00 37-00 00-00 00-00 00-00>)
Stack trace:
I0529 09:29:24.551668 18194 sched.cpp:625] Scheduler::statusUpdate took 87069ns
I0529 09:29:24.552121 18194 hierarchical_allocator_process.hpp:636] Recovered
cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] (total
allocatable: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000]) on
slave 20140529-092924-16842879-48472-18175-0 from framework
20140529-092924-16842879-48472-18175-0000
I0529 09:29:24.560880 18197 slave.cpp:143] Slave started on 137)@127.0.1.1:48472
I0529 09:29:24.560909 18197 credentials.hpp:35] Loading credentials for
authentication from
'/tmp/SlaveRecoveryTest_0_RestartBeforeContainerizerLaunch_q4MPqy/credential'
I0529 09:29:24.560972 18197 slave.cpp:242] Slave using credential for:
test-principal
I0529 09:29:24.561058 18197 slave.cpp:255] Slave resources: cpus(*):2;
mem(*):1024; disk(*):1024; ports(*):[31000-32000]
I0529 09:29:24.561120 18197 slave.cpp:283] Slave hostname: trusty
I0529 09:29:24.561133 18197 slave.cpp:284] Slave checkpoint: true
I0529 09:29:24.561491 18197 state.cpp:33] Recovering state from
'/tmp/SlaveRecoveryTest_0_RestartBeforeContainerizerLaunch_q4MPqy/meta'
W0529 09:29:24.562824 18197 state.cpp:426] Failed to find executor forked pid
file
'/tmp/SlaveRecoveryTest_0_RestartBeforeContainerizerLaunch_q4MPqy/meta/slaves/20140529-092924-16842879-48472-18175-0/frameworks/20140529-092924-16842879-48472-18175-0000/executors/24a52e8f-0b30-4970-8829-a7383f19980e/runs/7fbaa2b9-d41e-4d3e-897f-3d593b1308c7/pids/forked.pid'
I0529 09:29:24.563506 18201 slave.cpp:3086] Recovering framework
20140529-092924-16842879-48472-18175-0000
I0529 09:29:24.563535 18201 slave.cpp:3462] Recovering executor
'24a52e8f-0b30-4970-8829-a7383f19980e' of framework
20140529-092924-16842879-48472-18175-0000
I0529 09:29:24.563608 18201 slave.cpp:3660] Terminating task
24a52e8f-0b30-4970-8829-a7383f19980e
I0529 09:29:24.564241 18201 status_update_manager.cpp:193] Recovering status
update manager
I0529 09:29:24.564257 18201 status_update_manager.cpp:201] Recovering executor
'24a52e8f-0b30-4970-8829-a7383f19980e' of framework
20140529-092924-16842879-48472-18175-0000
I0529 09:29:24.564275 18201 status_update_manager.cpp:499] Creating
StatusUpdate stream for task 24a52e8f-0b30-4970-8829-a7383f19980e of framework
20140529-092924-16842879-48472-18175-0000
I0529 09:29:24.564371 18201 status_update_manager.hpp:306] Replaying status
update stream for task 24a52e8f-0b30-4970-8829-a7383f19980e
W0529 09:29:24.564391 18201 status_update_manager.cpp:381] Not forwarding
status update TASK_FAILED (UUID: aee0620e-c349-42ce-8cce-7b3835adad95) for task
24a52e8f-0b30-4970-8829-a7383f19980e of framework
20140529-092924-16842879-48472-18175-0000 because no master is elected yet
I0529 09:29:24.564512 18201 slave.cpp:497] Successfully attached file
'/tmp/SlaveRecoveryTest_0_RestartBeforeContainerizerLaunch_q4MPqy/slaves/20140529-092924-16842879-48472-18175-0/frameworks/20140529-092924-16842879-48472-18175-0000/executors/24a52e8f-0b30-4970-8829-a7383f19980e/runs/7fbaa2b9-d41e-4d3e-897f-3d593b1308c7'
I0529 09:29:24.564915 18194 slave.cpp:2967] Unable to reconnect to executor
'24a52e8f-0b30-4970-8829-a7383f19980e' of framework
20140529-092924-16842879-48472-18175-0000 because no libprocess PID was found
E0529 09:29:24.564950 18194 slave.cpp:2397] Termination of executor
'24a52e8f-0b30-4970-8829-a7383f19980e' of framework
'20140529-092924-16842879-48472-18175-0000' failed: Unknown container:
7fbaa2b9-d41e-4d3e-897f-3d593b1308c7
E0529 09:29:24.565001 18194 slave.cpp:2686] Failed to unmonitor container for
executor 24a52e8f-0b30-4970-8829-a7383f19980e of framework
20140529-092924-16842879-48472-18175-0000: Not monitored
I0529 09:29:24.581280 18198 hierarchical_allocator_process.hpp:687] Performed
allocation for 1 slaves in 56677ns
I0529 09:29:24.581333 18198 slave.cpp:1967] Cleaning up un-reregistered
executors
I0529 09:29:24.581362 18198 slave.cpp:3018] Finished recovery
I0529 09:29:24.581876 18198 slave.cpp:536] New master detected at
[email protected]:48472
I0529 09:29:24.581909 18198 slave.cpp:612] Authenticating with master
[email protected]:48472
I0529 09:29:24.581959 18198 slave.cpp:585] Detecting new master
I0529 09:29:24.581992 18198 status_update_manager.cpp:167] New master detected
at [email protected]:48472
W0529 09:29:24.582008 18198 status_update_manager.cpp:181] Resending status
update TASK_FAILED (UUID: aee0620e-c349-42ce-8cce-7b3835adad95) for task
24a52e8f-0b30-4970-8829-a7383f19980e of framework
20140529-092924-16842879-48472-18175-0000
I0529 09:29:24.582025 18198 status_update_manager.cpp:373] Forwarding status
update TASK_FAILED (UUID: aee0620e-c349-42ce-8cce-7b3835adad95) for task
24a52e8f-0b30-4970-8829-a7383f19980e of framework
20140529-092924-16842879-48472-18175-0000 to [email protected]:48472
I0529 09:29:24.582099 18198 authenticatee.hpp:128] Creating new client SASL
connection
W0529 09:29:24.582298 18198 master.cpp:2621] Could not lookup task for status
update TASK_FAILED (UUID: aee0620e-c349-42ce-8cce-7b3835adad95) for task
24a52e8f-0b30-4970-8829-a7383f19980e of framework
20140529-092924-16842879-48472-18175-0000 from slave
20140529-092924-16842879-48472-18175-0 at slave(136)@127.0.1.1:48472 (trusty)
I0529 09:29:24.582356 18198 master.cpp:2986] Authenticating
slave(137)@127.0.1.1:48472
I0529 09:29:24.582557 18195 authenticator.hpp:156] Creating new server SASL
connection
I0529 09:29:24.582679 18195 authenticatee.hpp:219] Received SASL authentication
mechanisms: CRAM-MD5
I0529 09:29:24.582700 18195 authenticatee.hpp:245] Attempting to authenticate
with mechanism 'CRAM-MD5'
I0529 09:29:24.582726 18195 authenticator.hpp:262] Received SASL authentication
start
I0529 09:29:24.582770 18195 authenticator.hpp:384] Authentication requires more
steps
I0529 09:29:24.582798 18195 authenticatee.hpp:265] Received SASL authentication
step
I0529 09:29:24.582834 18195 authenticator.hpp:290] Received SASL authentication
step
I0529 09:29:24.582852 18195 auxprop.cpp:81] Request to lookup properties for
user: 'test-principal' realm: 'trusty' server FQDN: 'trusty'
SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false
SASL_AUXPROP_AUTHZID: false
I0529 09:29:24.582861 18195 auxprop.cpp:153] Looking up auxiliary property
'*userPassword'
I0529 09:29:24.582875 18195 auxprop.cpp:153] Looking up auxiliary property
'*cmusaslsecretCRAM-MD5'
I0529 09:29:24.582885 18195 auxprop.cpp:81] Request to lookup properties for
user: 'test-principal' realm: 'trusty' server FQDN: 'trusty'
SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false
SASL_AUXPROP_AUTHZID: true
I0529 09:29:24.582892 18195 auxprop.cpp:103] Skipping auxiliary property
'*userPassword' since SASL_AUXPROP_AUTHZID == true
I0529 09:29:24.582898 18195 auxprop.cpp:103] Skipping auxiliary property
'*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
I0529 09:29:24.582911 18195 authenticator.hpp:376] Authentication success
I0529 09:29:24.582938 18195 authenticatee.hpp:305] Authentication success
I0529 09:29:24.582962 18195 master.cpp:3026] Successfully authenticated
principal 'test-principal' at slave(137)@127.0.1.1:48472
I0529 09:29:24.583034 18195 slave.cpp:669] Successfully authenticated with
master [email protected]:48472
I0529 09:29:24.583108 18195 slave.cpp:902] Will retry registration in
13.697338ms if necessary
W0529 09:29:24.583199 18195 master.cpp:2417] Slave at
slave(137)@127.0.1.1:48472 (trusty) is being allowed to re-register with an
already in use id (20140529-092924-16842879-48472-18175-0)
I0529 09:29:24.583282 18195 slave.cpp:753] Re-registered with master
[email protected]:48472
I0529 09:29:24.583312 18195 slave.cpp:1514] Updating framework
20140529-092924-16842879-48472-18175-0000 pid to scheduler(130)@127.0.1.1:48472
I0529 09:29:24.583348 18195 slave.cpp:1522] Checkpointing framework pid
'scheduler(130)@127.0.1.1:48472' to
'/tmp/SlaveRecoveryTest_0_RestartBeforeContainerizerLaunch_q4MPqy/meta/slaves/20140529-092924-16842879-48472-18175-0/frameworks/20140529-092924-16842879-48472-18175-0000/framework.pid'
I0529 09:29:24.583536 18195 hierarchical_allocator_process.hpp:497] Slave
20140529-092924-16842879-48472-18175-0 reconnected
W0529 09:29:24.583560 18195 status_update_manager.cpp:181] Resending status
update TASK_FAILED (UUID: aee0620e-c349-42ce-8cce-7b3835adad95) for task
24a52e8f-0b30-4970-8829-a7383f19980e of framework
20140529-092924-16842879-48472-18175-0000
I0529 09:29:24.583573 18195 status_update_manager.cpp:373] Forwarding status
update TASK_FAILED (UUID: aee0620e-c349-42ce-8cce-7b3835adad95) for task
24a52e8f-0b30-4970-8829-a7383f19980e of framework
20140529-092924-16842879-48472-18175-0000 to [email protected]:48472
W0529 09:29:24.583658 18195 master.cpp:2621] Could not lookup task for status
update TASK_FAILED (UUID: aee0620e-c349-42ce-8cce-7b3835adad95) for task
24a52e8f-0b30-4970-8829-a7383f19980e of framework
20140529-092924-16842879-48472-18175-0000 from slave
20140529-092924-16842879-48472-18175-0 at slave(137)@127.0.1.1:48472 (trusty)
I0529 09:29:24.584329 18198 sched.cpp:625] Scheduler::statusUpdate took 16101ns
tests/slave_recovery_tests.cpp:3068: Failure
Mock function called more times than expected - returning directly.
Function call: statusUpdate(0x7fff69ebf410, @0x2ad9b40bc400 72-byte object
<30-17 18-A5 D9-2A 00-00 00-00 00-00 00-00 00-00 90-31 03-B4 D9-2A 00-00 B0-B1
0F-B4 D9-2A 00-00 38-30 D1-00 00-00 00-00 A0-39 0C-B4 D9-2A 00-00 1B-13 63-6B
13-EB D4-41 03-00 00-00 00-00 00-00 37-00 00-00 00-00 00-00>)
Expected: to be called once
Actual: called twice - over-saturated and active
I0529 09:29:24.584525 18198 sched.cpp:625] Scheduler::statusUpdate took 121892ns
I0529 09:29:24.584609 18198 status_update_manager.cpp:398] Received status
update acknowledgement (UUID: aee0620e-c349-42ce-8cce-7b3835adad95) for task
24a52e8f-0b30-4970-8829-a7383f19980e of framework
20140529-092924-16842879-48472-18175-0000
I0529 09:29:24.584632 18198 status_update_manager.hpp:342] Checkpointing ACK
for status update TASK_FAILED (UUID: aee0620e-c349-42ce-8cce-7b3835adad95) for
task 24a52e8f-0b30-4970-8829-a7383f19980e of framework
20140529-092924-16842879-48472-18175-0000
I0529 09:29:24.585109 18194 sched.cpp:730] Stopping framework
'20140529-092924-16842879-48472-18175-0000'
I0529 09:29:24.585196 18196 master.cpp:1272] Asked to unregister framework
20140529-092924-16842879-48472-18175-0000
I0529 09:29:24.585211 18196 master.cpp:3344] Removing framework
20140529-092924-16842879-48472-18175-0000
I0529 09:29:24.585304 18196 hierarchical_allocator_process.hpp:407] Deactivated
framework 20140529-092924-16842879-48472-18175-0000
I0529 09:29:24.585342 18196 hierarchical_allocator_process.hpp:362] Removed
framework 20140529-092924-16842879-48472-18175-0000
I0529 09:29:24.585366 18196 slave.cpp:1337] Asked to shut down framework
20140529-092924-16842879-48472-18175-0000 by [email protected]:48472
I0529 09:29:24.585381 18196 slave.cpp:1362] Shutting down framework
20140529-092924-16842879-48472-18175-0000
I0529 09:29:24.585396 18196 slave.cpp:2552] Cleaning up executor
'24a52e8f-0b30-4970-8829-a7383f19980e' of framework
20140529-092924-16842879-48472-18175-0000
I0529 09:29:24.585631 18196 slave.cpp:2627] Cleaning up framework
20140529-092924-16842879-48472-18175-0000
I0529 09:29:24.585710 18196 gc.cpp:56] Scheduling
'/tmp/SlaveRecoveryTest_0_RestartBeforeContainerizerLaunch_q4MPqy/slaves/20140529-092924-16842879-48472-18175-0/frameworks/20140529-092924-16842879-48472-18175-0000/executors/24a52e8f-0b30-4970-8829-a7383f19980e/runs/7fbaa2b9-d41e-4d3e-897f-3d593b1308c7'
for gc 6.99999322304296days in the future
I0529 09:29:24.585954 18196 gc.cpp:56] Scheduling
'/tmp/SlaveRecoveryTest_0_RestartBeforeContainerizerLaunch_q4MPqy/slaves/20140529-092924-16842879-48472-18175-0/frameworks/20140529-092924-16842879-48472-18175-0000/executors/24a52e8f-0b30-4970-8829-a7383f19980e'
for gc 6.99999322254518days in the future
I0529 09:29:24.586061 18196 gc.cpp:56] Scheduling
'/tmp/SlaveRecoveryTest_0_RestartBeforeContainerizerLaunch_q4MPqy/meta/slaves/20140529-092924-16842879-48472-18175-0/frameworks/20140529-092924-16842879-48472-18175-0000/executors/24a52e8f-0b30-4970-8829-a7383f19980e/runs/7fbaa2b9-d41e-4d3e-897f-3d593b1308c7'
for gc 6.99999322225778days in the future
I0529 09:29:24.586094 18196 gc.cpp:56] Scheduling
'/tmp/SlaveRecoveryTest_0_RestartBeforeContainerizerLaunch_q4MPqy/meta/slaves/20140529-092924-16842879-48472-18175-0/frameworks/20140529-092924-16842879-48472-18175-0000/executors/24a52e8f-0b30-4970-8829-a7383f19980e'
for gc 6.99999322199111days in the future
I0529 09:29:24.586122 18196 gc.cpp:56] Scheduling
'/tmp/SlaveRecoveryTest_0_RestartBeforeContainerizerLaunch_q4MPqy/slaves/20140529-092924-16842879-48472-18175-0/frameworks/20140529-092924-16842879-48472-18175-0000'
for gc 6.99999322143407days in the future
I0529 09:29:24.586148 18196 gc.cpp:56] Scheduling
'/tmp/SlaveRecoveryTest_0_RestartBeforeContainerizerLaunch_q4MPqy/meta/slaves/20140529-092924-16842879-48472-18175-0/frameworks/20140529-092924-16842879-48472-18175-0000'
for gc 6.99999322119111days in the future
I0529 09:29:24.586599 18196 master.cpp:574] Master terminating
I0529 09:29:24.589107 18199 slave.cpp:2224] [email protected]:48472 exited
W0529 09:29:24.589130 18199 slave.cpp:2227] Master disconnected! Waiting for a
new master to be elected
I0529 09:29:24.589537 18198 status_update_manager.cpp:530] Cleaning up status
update stream for task 24a52e8f-0b30-4970-8829-a7383f19980e of framework
20140529-092924-16842879-48472-18175-0000
I0529 09:29:24.589920 18198 status_update_manager.cpp:398] Received status
update acknowledgement (UUID: aee0620e-c349-42ce-8cce-7b3835adad95) for task
24a52e8f-0b30-4970-8829-a7383f19980e of framework
20140529-092924-16842879-48472-18175-0000
I0529 09:29:24.590104 18199 slave.cpp:1604] Status update manager successfully
handled status update acknowledgement (UUID:
aee0620e-c349-42ce-8cce-7b3835adad95) for task
24a52e8f-0b30-4970-8829-a7383f19980e of framework
20140529-092924-16842879-48472-18175-0000
E0529 09:29:24.590286 18199 slave.cpp:1615] Status update acknowledgement
(UUID: aee0620e-c349-42ce-8cce-7b3835adad95) for task
24a52e8f-0b30-4970-8829-a7383f19980e of unknown framework
20140529-092924-16842879-48472-18175-0000
I0529 09:29:24.590569 18198 status_update_manager.cpp:282] Closing status
update streams for framework 20140529-092924-16842879-48472-18175-0000
E0529 09:29:24.590759 18194 slave.cpp:1597] Failed to handle status update
acknowledgement (UUID: aee0620e-c349-42ce-8cce-7b3835adad95) for task
24a52e8f-0b30-4970-8829-a7383f19980e of framework
20140529-092924-16842879-48472-18175-0000: Cannot find the status update stream
for task 24a52e8f-0b30-4970-8829-a7383f19980e of framework
20140529-092924-16842879-48472-18175-0000
I0529 09:29:24.592489 18175 slave.cpp:425] Slave terminating
[ FAILED ] SlaveRecoveryTest/0.RestartBeforeContainerizerLaunch, where
TypeParam = mesos::internal::slave::MesosContainerizer (112 ms)
{noformat}
--
This message was sent by Atlassian JIRA
(v6.2#6252)