Yan Xu created MESOS-1464:
-----------------------------
Summary: SlaveRecoveryTest/0.ShutdownSlave runs forever
Key: MESOS-1464
URL: https://issues.apache.org/jira/browse/MESOS-1464
Project: Mesos
Issue Type: Bug
Components: test
Affects Versions: 0.19.0
Reporter: Yan Xu
{noformat}
[ RUN ] SlaveRecoveryTest/0.ShutdownSlave
Using temporary directory '/tmp/SlaveRecoveryTest_0_ShutdownSlave_7wX1Wv'
I0606 18:38:59.725330 24987 leveldb.cpp:176] Opened db in 788125ns
I0606 18:38:59.726125 24987 leveldb.cpp:183] Compacted db in 405518ns
I0606 18:38:59.726558 24987 leveldb.cpp:198] Created db iterator in 18709ns
I0606 18:38:59.727007 24987 leveldb.cpp:204] Seeked to beginning of db in
70836ns
I0606 18:38:59.727603 24987 leveldb.cpp:273] Iterated through 0 keys in the db
in 13260ns
I0606 18:38:59.727993 24987 replica.cpp:741] Replica recovered with log
positions 0 -> 0 with 1 holes and 0 unlearned
I0606 18:38:59.728660 25007 recover.cpp:425] Starting replica recovery
I0606 18:38:59.728736 25007 recover.cpp:451] Replica is in EMPTY status
I0606 18:38:59.729061 25007 replica.cpp:638] Replica in EMPTY status received a
broadcasted recover request
I0606 18:38:59.729132 25007 recover.cpp:188] Received a recover response from a
replica in EMPTY status
I0606 18:38:59.729236 25007 recover.cpp:542] Updating replica status to STARTING
I0606 18:38:59.729341 25007 leveldb.cpp:306] Persisting metadata (8 bytes) to
leveldb took 55958ns
I0606 18:38:59.729357 25007 replica.cpp:320] Persisted replica status to
STARTING
I0606 18:38:59.729409 25007 recover.cpp:451] Replica is in STARTING status
I0606 18:38:59.729634 25007 replica.cpp:638] Replica in STARTING status
received a broadcasted recover request
I0606 18:38:59.729704 25007 recover.cpp:188] Received a recover response from a
replica in STARTING status
I0606 18:38:59.729815 25007 recover.cpp:542] Updating replica status to VOTING
I0606 18:38:59.729876 25007 leveldb.cpp:306] Persisting metadata (8 bytes) to
leveldb took 22335ns
I0606 18:38:59.729890 25007 replica.cpp:320] Persisted replica status to VOTING
I0606 18:38:59.729919 25007 recover.cpp:556] Successfully joined the Paxos group
I0606 18:38:59.729964 25007 recover.cpp:440] Recover process terminated
I0606 18:38:59.734488 25005 master.cpp:272] Master
20140606-183859-2272962752-40636-24987 (fedora-19) started on
192.168.122.135:40636
I0606 18:38:59.734532 25005 master.cpp:309] Master only allowing authenticated
frameworks to register
I0606 18:38:59.734540 25005 master.cpp:314] Master only allowing authenticated
slaves to register
I0606 18:38:59.734547 25005 credentials.hpp:35] Loading credentials for
authentication from '/tmp/SlaveRecoveryTest_0_ShutdownSlave_7wX1Wv/credentials'
I0606 18:38:59.734627 25005 master.cpp:340] Master enabling authorization
I0606 18:38:59.735643 25009 hierarchical_allocator_process.hpp:301]
Initializing hierarchical allocator process with master :
[email protected]:40636
I0606 18:38:59.735699 25011 master.cpp:108] No whitelist given. Advertising
offers for all slaves
I0606 18:38:59.736399 25005 master.cpp:957] The newly elected leader is
[email protected]:40636 with id 20140606-183859-2272962752-40636-24987
I0606 18:38:59.736423 25005 master.cpp:970] Elected as the leading master!
I0606 18:38:59.736431 25005 master.cpp:788] Recovering from registrar
I0606 18:38:59.736482 25005 registrar.cpp:313] Recovering registrar
I0606 18:38:59.736695 25005 log.cpp:656] Attempting to start the writer
I0606 18:38:59.737006 25005 replica.cpp:474] Replica received implicit promise
request with proposal 1
I0606 18:38:59.737486 25005 leveldb.cpp:306] Persisting metadata (8 bytes) to
leveldb took 471935ns
I0606 18:38:59.737920 25005 replica.cpp:342] Persisted promised to 1
I0606 18:38:59.738417 25005 coordinator.cpp:230] Coordinator attemping to fill
missing position
I0606 18:38:59.739327 25005 replica.cpp:375] Replica received explicit promise
request for position 0 with proposal 2
I0606 18:38:59.739951 25005 leveldb.cpp:343] Persisting action (8 bytes) to
leveldb took 35555ns
I0606 18:38:59.740437 25005 replica.cpp:676] Persisted action at 0
I0606 18:38:59.744415 25010 replica.cpp:508] Replica received write request for
position 0
I0606 18:38:59.744458 25010 leveldb.cpp:438] Reading position from leveldb took
23012ns
I0606 18:38:59.744488 25010 leveldb.cpp:343] Persisting action (14 bytes) to
leveldb took 20719ns
I0606 18:38:59.744499 25010 replica.cpp:676] Persisted action at 0
I0606 18:38:59.744623 25010 replica.cpp:655] Replica received learned notice
for position 0
I0606 18:38:59.744652 25010 leveldb.cpp:343] Persisting action (16 bytes) to
leveldb took 19198ns
I0606 18:38:59.744663 25010 replica.cpp:676] Persisted action at 0
I0606 18:38:59.744669 25010 replica.cpp:661] Replica learned NOP action at
position 0
I0606 18:38:59.744842 25010 log.cpp:672] Writer started with ending position 0
I0606 18:38:59.745101 25010 leveldb.cpp:438] Reading position from leveldb took
27519ns
I0606 18:38:59.746109 25009 registrar.cpp:346] Successfully fetched the
registry (0B)
I0606 18:38:59.746136 25009 registrar.cpp:422] Attempting to update the
'registry'
I0606 18:38:59.747542 25010 log.cpp:680] Attempting to append 131 bytes to the
log
I0606 18:38:59.748111 25007 coordinator.cpp:340] Coordinator attempting to
write APPEND action at position 1
I0606 18:38:59.748301 25007 replica.cpp:508] Replica received write request for
position 1
I0606 18:38:59.748342 25007 leveldb.cpp:343] Persisting action (150 bytes) to
leveldb took 27852ns
I0606 18:38:59.748353 25007 replica.cpp:676] Persisted action at 1
I0606 18:38:59.748498 25007 replica.cpp:655] Replica received learned notice
for position 1
I0606 18:38:59.748530 25007 leveldb.cpp:343] Persisting action (152 bytes) to
leveldb took 22953ns
I0606 18:38:59.748541 25007 replica.cpp:676] Persisted action at 1
I0606 18:38:59.748548 25007 replica.cpp:661] Replica learned APPEND action at
position 1
I0606 18:38:59.749064 25010 registrar.cpp:479] Successfully updated 'registry'
I0606 18:38:59.749287 25007 log.cpp:699] Attempting to truncate the log to 1
I0606 18:38:59.749758 25006 coordinator.cpp:340] Coordinator attempting to
write TRUNCATE action at position 2
I0606 18:38:59.749932 25006 replica.cpp:508] Replica received write request for
position 2
I0606 18:38:59.749965 25006 leveldb.cpp:343] Persisting action (16 bytes) to
leveldb took 23310ns
I0606 18:38:59.749977 25006 replica.cpp:676] Persisted action at 2
I0606 18:38:59.750231 25006 replica.cpp:655] Replica received learned notice
for position 2
I0606 18:38:59.750263 25006 leveldb.cpp:343] Persisting action (18 bytes) to
leveldb took 23049ns
I0606 18:38:59.750280 25006 leveldb.cpp:401] Deleting ~1 keys from leveldb took
10864ns
I0606 18:38:59.750289 25006 replica.cpp:676] Persisted action at 2
I0606 18:38:59.750298 25006 replica.cpp:661] Replica learned TRUNCATE action at
position 2
I0606 18:38:59.750797 25010 registrar.cpp:372] Successfully recovered registrar
I0606 18:38:59.754741 24987 mesos_containerizer.cpp:124] Using isolation:
posix/cpu,posix/mem
I0606 18:38:59.755175 25006 master.cpp:815] Recovered 0 slaves from the
Registry (95B) ; allowing 10mins for slaves to re-register
I0606 18:38:59.758690 25011 slave.cpp:143] Slave started on
53)@192.168.122.135:40636
I0606 18:38:59.758719 25011 credentials.hpp:35] Loading credentials for
authentication from '/tmp/SlaveRecoveryTest_0_ShutdownSlave_83ZbbJ/credential'
I0606 18:38:59.758790 25011 slave.cpp:243] Slave using credential for:
test-principal
I0606 18:38:59.758880 25011 slave.cpp:256] Slave resources: cpus(*):2;
mem(*):1024; disk(*):1024; ports(*):[31000-32000]
I0606 18:38:59.759521 25011 slave.cpp:284] Slave hostname: fedora-19
I0606 18:38:59.759539 25011 slave.cpp:285] Slave checkpoint: true
I0606 18:38:59.760252 25011 state.cpp:33] Recovering state from
'/tmp/SlaveRecoveryTest_0_ShutdownSlave_83ZbbJ/meta'
I0606 18:38:59.760339 25011 status_update_manager.cpp:193] Recovering status
update manager
I0606 18:38:59.760387 25011 mesos_containerizer.cpp:281] Recovering
containerizer
I0606 18:38:59.760546 25011 slave.cpp:3024] Finished recovery
I0606 18:38:59.760848 25011 slave.cpp:537] New master detected at
[email protected]:40636
I0606 18:38:59.760875 25011 slave.cpp:613] Authenticating with master
[email protected]:40636
I0606 18:38:59.760916 25011 slave.cpp:586] Detecting new master
I0606 18:38:59.760947 25011 status_update_manager.cpp:167] New master detected
at [email protected]:40636
I0606 18:38:59.760979 25011 authenticatee.hpp:128] Creating new client SASL
connection
I0606 18:38:59.761829 25011 master.cpp:2966] Authenticating
slave(53)@192.168.122.135:40636
I0606 18:38:59.762437 25011 authenticator.hpp:156] Creating new server SASL
connection
I0606 18:38:59.763381 25005 authenticatee.hpp:219] Received SASL authentication
mechanisms: CRAM-MD5
I0606 18:38:59.763427 25005 authenticatee.hpp:245] Attempting to authenticate
with mechanism 'CRAM-MD5'
I0606 18:38:59.763705 25011 authenticator.hpp:262] Received SASL authentication
start
I0606 18:38:59.764219 25011 authenticator.hpp:384] Authentication requires more
steps
I0606 18:38:59.764729 25010 authenticatee.hpp:265] Received SASL authentication
step
I0606 18:38:59.765012 25011 authenticator.hpp:290] Received SASL authentication
step
I0606 18:38:59.765466 25011 auxprop.cpp:81] Request to lookup properties for
user: 'test-principal' realm: 'fedora-19' server FQDN: 'fedora-19'
SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false
SASL_AUXPROP_AUTHZID: false
I0606 18:38:59.765903 25011 auxprop.cpp:153] Looking up auxiliary property
'*userPassword'
I0606 18:38:59.766316 25011 auxprop.cpp:153] Looking up auxiliary property
'*cmusaslsecretCRAM-MD5'
I0606 18:38:59.766685 25011 auxprop.cpp:81] Request to lookup properties for
user: 'test-principal' realm: 'fedora-19' server FQDN: 'fedora-19'
SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false
SASL_AUXPROP_AUTHZID: true
I0606 18:38:59.767141 25011 auxprop.cpp:103] Skipping auxiliary property
'*userPassword' since SASL_AUXPROP_AUTHZID == true
I0606 18:38:59.767516 25011 auxprop.cpp:103] Skipping auxiliary property
'*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
I0606 18:38:59.767892 25011 authenticator.hpp:376] Authentication success
I0606 18:38:59.768312 25007 authenticatee.hpp:305] Authentication success
I0606 18:38:59.773574 25008 master.cpp:3006] Successfully authenticated
principal 'test-principal' at slave(53)@192.168.122.135:40636
I0606 18:38:59.775130 25007 slave.cpp:670] Successfully authenticated with
master [email protected]:40636
I0606 18:38:59.775876 24987 sched.cpp:128] Version: 0.20.0
I0606 18:38:59.776198 25009 master.cpp:2282] Registering slave at
slave(53)@192.168.122.135:40636 (fedora-19) with id
20140606-183859-2272962752-40636-24987-0
I0606 18:38:59.776563 25009 registrar.cpp:422] Attempting to update the
'registry'
I0606 18:38:59.777896 25009 log.cpp:680] Attempting to append 312 bytes to the
log
I0606 18:38:59.778344 25011 coordinator.cpp:340] Coordinator attempting to
write APPEND action at position 3
I0606 18:38:59.778573 25011 replica.cpp:508] Replica received write request for
position 3
I0606 18:38:59.778632 25011 leveldb.cpp:343] Persisting action (331 bytes) to
leveldb took 45885ns
I0606 18:38:59.778646 25011 replica.cpp:676] Persisted action at 3
I0606 18:38:59.779242 25010 replica.cpp:655] Replica received learned notice
for position 3
I0606 18:38:59.779285 25010 leveldb.cpp:343] Persisting action (333 bytes) to
leveldb took 27351ns
I0606 18:38:59.779311 25010 replica.cpp:676] Persisted action at 3
I0606 18:38:59.779321 25010 replica.cpp:661] Replica learned APPEND action at
position 3
I0606 18:38:59.782156 25005 registrar.cpp:479] Successfully updated 'registry'
I0606 18:38:59.782237 25005 master.cpp:2322] Registered slave
20140606-183859-2272962752-40636-24987-0 at slave(53)@192.168.122.135:40636
(fedora-19)
I0606 18:38:59.782253 25005 master.cpp:3452] Adding slave
20140606-183859-2272962752-40636-24987-0 at slave(53)@192.168.122.135:40636
(fedora-19) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000]
I0606 18:38:59.782378 25005 hierarchical_allocator_process.hpp:444] Added slave
20140606-183859-2272962752-40636-24987-0 (fedora-19) with cpus(*):2;
mem(*):1024; disk(*):1024; ports(*):[31000-32000] (and cpus(*):2; mem(*):1024;
disk(*):1024; ports(*):[31000-32000] available)
I0606 18:38:59.782421 25005 hierarchical_allocator_process.hpp:707] Performed
allocation for slave 20140606-183859-2272962752-40636-24987-0 in 17098ns
I0606 18:38:59.782996 25012 log.cpp:699] Attempting to truncate the log to 3
I0606 18:38:59.783079 25012 coordinator.cpp:340] Coordinator attempting to
write TRUNCATE action at position 4
I0606 18:38:59.783355 25012 replica.cpp:508] Replica received write request for
position 4
I0606 18:38:59.783396 25012 leveldb.cpp:343] Persisting action (16 bytes) to
leveldb took 26490ns
I0606 18:38:59.783408 25012 replica.cpp:676] Persisted action at 4
I0606 18:38:59.783545 25012 replica.cpp:655] Replica received learned notice
for position 4
I0606 18:38:59.783577 25012 leveldb.cpp:343] Persisting action (18 bytes) to
leveldb took 21641ns
I0606 18:38:59.783597 25012 leveldb.cpp:401] Deleting ~2 keys from leveldb took
13425ns
I0606 18:38:59.783607 25012 replica.cpp:676] Persisted action at 4
I0606 18:38:59.783617 25012 replica.cpp:661] Replica learned TRUNCATE action at
position 4
I0606 18:38:59.776222 25007 slave.cpp:908] Will retry registration in 8.32368ms
if necessary
I0606 18:38:59.787597 25007 slave.cpp:704] Registered with master
[email protected]:40636; given slave ID
20140606-183859-2272962752-40636-24987-0
I0606 18:38:59.787914 25008 sched.cpp:224] New master detected at
[email protected]:40636
I0606 18:38:59.788280 25008 sched.cpp:274] Authenticating with master
[email protected]:40636
I0606 18:38:59.788406 25008 authenticatee.hpp:128] Creating new client SASL
connection
I0606 18:38:59.788957 25008 master.cpp:2966] Authenticating
scheduler(46)@192.168.122.135:40636
I0606 18:38:59.789103 25005 authenticator.hpp:156] Creating new server SASL
connection
I0606 18:38:59.789572 25005 authenticatee.hpp:219] Received SASL authentication
mechanisms: CRAM-MD5
I0606 18:38:59.789598 25005 authenticatee.hpp:245] Attempting to authenticate
with mechanism 'CRAM-MD5'
I0606 18:38:59.789623 25005 authenticator.hpp:262] Received SASL authentication
start
I0606 18:38:59.789661 25005 authenticator.hpp:384] Authentication requires more
steps
I0606 18:38:59.789687 25005 authenticatee.hpp:265] Received SASL authentication
step
I0606 18:38:59.789722 25005 authenticator.hpp:290] Received SASL authentication
step
I0606 18:38:59.789739 25005 auxprop.cpp:81] Request to lookup properties for
user: 'test-principal' realm: 'fedora-19' server FQDN: 'fedora-19'
SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false
SASL_AUXPROP_AUTHZID: false
I0606 18:38:59.789748 25005 auxprop.cpp:153] Looking up auxiliary property
'*userPassword'
I0606 18:38:59.789760 25005 auxprop.cpp:153] Looking up auxiliary property
'*cmusaslsecretCRAM-MD5'
I0606 18:38:59.789769 25005 auxprop.cpp:81] Request to lookup properties for
user: 'test-principal' realm: 'fedora-19' server FQDN: 'fedora-19'
SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false
SASL_AUXPROP_AUTHZID: true
I0606 18:38:59.789777 25005 auxprop.cpp:103] Skipping auxiliary property
'*userPassword' since SASL_AUXPROP_AUTHZID == true
I0606 18:38:59.789783 25005 auxprop.cpp:103] Skipping auxiliary property
'*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
I0606 18:38:59.789793 25005 authenticator.hpp:376] Authentication success
I0606 18:38:59.789819 25005 authenticatee.hpp:305] Authentication success
I0606 18:38:59.789841 25005 master.cpp:3006] Successfully authenticated
principal 'test-principal' at scheduler(46)@192.168.122.135:40636
I0606 18:38:59.789906 25005 sched.cpp:348] Successfully authenticated with
master [email protected]:40636
I0606 18:38:59.789921 25005 sched.cpp:467] Sending registration request to
[email protected]:40636
I0606 18:38:59.789963 25005 master.cpp:1048] Received registration request from
scheduler(46)@192.168.122.135:40636
I0606 18:38:59.789994 25005 master.cpp:1066] Registering framework
20140606-183859-2272962752-40636-24987-0000 at
scheduler(46)@192.168.122.135:40636
I0606 18:38:59.790093 25008 sched.cpp:398] Framework registered with
20140606-183859-2272962752-40636-24987-0000
I0606 18:38:59.790118 25008 sched.cpp:412] Scheduler::registered took 12838ns
I0606 18:38:59.790499 25005 hierarchical_allocator_process.hpp:331] Added
framework 20140606-183859-2272962752-40636-24987-0000
I0606 18:38:59.790542 25005 hierarchical_allocator_process.hpp:751] Offering
cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave
20140606-183859-2272962752-40636-24987-0 to framework
20140606-183859-2272962752-40636-24987-0000
I0606 18:38:59.790628 25005 hierarchical_allocator_process.hpp:687] Performed
allocation for 1 slaves in 115789ns
I0606 18:38:59.790678 25005 master.hpp:693] Adding offer
20140606-183859-2272962752-40636-24987-0 with resources cpus(*):2; mem(*):1024;
disk(*):1024; ports(*):[31000-32000] on slave
20140606-183859-2272962752-40636-24987-0 (fedora-19)
I0606 18:38:59.790717 25005 master.cpp:2913] Sending 1 offers to framework
20140606-183859-2272962752-40636-24987-0000
I0606 18:38:59.791182 25005 sched.cpp:535] Scheduler::resourceOffers took
25837ns
I0606 18:38:59.792461 25007 slave.cpp:717] Checkpointing SlaveInfo to
'/tmp/SlaveRecoveryTest_0_ShutdownSlave_83ZbbJ/meta/slaves/20140606-183859-2272962752-40636-24987-0/slave.info'
I0606 18:38:59.793648 25006 master.hpp:703] Removing offer
20140606-183859-2272962752-40636-24987-0 with resources cpus(*):2; mem(*):1024;
disk(*):1024; ports(*):[31000-32000] on slave
20140606-183859-2272962752-40636-24987-0 (fedora-19)
I0606 18:38:59.793702 25006 master.cpp:1869] Processing reply for offers: [
20140606-183859-2272962752-40636-24987-0 ] on slave
20140606-183859-2272962752-40636-24987-0 at slave(53)@192.168.122.135:40636
(fedora-19) for framework 20140606-183859-2272962752-40636-24987-0000
I0606 18:38:59.793773 25006 master.hpp:665] Adding task
de849715-bebc-44f6-b743-b9936ba4c0a4 with resources cpus(*):2; mem(*):1024;
disk(*):1024; ports(*):[31000-32000] on slave
20140606-183859-2272962752-40636-24987-0 (fedora-19)
I0606 18:38:59.793812 25006 master.cpp:3091] Launching task
de849715-bebc-44f6-b743-b9936ba4c0a4 of framework
20140606-183859-2272962752-40636-24987-0000 with resources cpus(*):2;
mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave
20140606-183859-2272962752-40636-24987-0 at slave(53)@192.168.122.135:40636
(fedora-19)
I0606 18:38:59.793915 25006 slave.cpp:939] Got assigned task
de849715-bebc-44f6-b743-b9936ba4c0a4 for framework
20140606-183859-2272962752-40636-24987-0000
I0606 18:38:59.793983 25006 slave.cpp:3296] Checkpointing FrameworkInfo to
'/tmp/SlaveRecoveryTest_0_ShutdownSlave_83ZbbJ/meta/slaves/20140606-183859-2272962752-40636-24987-0/frameworks/20140606-183859-2272962752-40636-24987-0000/framework.info'
I0606 18:38:59.794581 25006 slave.cpp:3303] Checkpointing framework pid
'scheduler(46)@192.168.122.135:40636' to
'/tmp/SlaveRecoveryTest_0_ShutdownSlave_83ZbbJ/meta/slaves/20140606-183859-2272962752-40636-24987-0/frameworks/20140606-183859-2272962752-40636-24987-0000/framework.pid'
I0606 18:38:59.794858 25006 slave.cpp:1049] Launching task
de849715-bebc-44f6-b743-b9936ba4c0a4 for framework
20140606-183859-2272962752-40636-24987-0000
I0606 18:38:59.796583 25006 slave.cpp:3618] Checkpointing ExecutorInfo to
'/tmp/SlaveRecoveryTest_0_ShutdownSlave_83ZbbJ/meta/slaves/20140606-183859-2272962752-40636-24987-0/frameworks/20140606-183859-2272962752-40636-24987-0000/executors/de849715-bebc-44f6-b743-b9936ba4c0a4/executor.info'
I0606 18:38:59.797668 25008 mesos_containerizer.cpp:537] Starting container
'6266edf5-4bbe-40a4-a09d-358472cb8041' for executor
'de849715-bebc-44f6-b743-b9936ba4c0a4' of framework
'20140606-183859-2272962752-40636-24987-0000'
I0606 18:38:59.800356 25012 launcher.cpp:117] Forked child with pid '25411' for
container '6266edf5-4bbe-40a4-a09d-358472cb8041'
I0606 18:38:59.800930 25012 mesos_containerizer.cpp:803] Checkpointing
executor's forked pid 25411 to
'/tmp/SlaveRecoveryTest_0_ShutdownSlave_83ZbbJ/meta/slaves/20140606-183859-2272962752-40636-24987-0/frameworks/20140606-183859-2272962752-40636-24987-0000/executors/de849715-bebc-44f6-b743-b9936ba4c0a4/runs/6266edf5-4bbe-40a4-a09d-358472cb8041/pids/forked.pid'
I0606 18:38:59.802824 25012 mesos_containerizer.cpp:647] Fetching URIs for
container '6266edf5-4bbe-40a4-a09d-358472cb8041' using command
'/var/jenkins/workspace/mesos-fedora-19-clang/src/mesos-fetcher'
I0606 18:38:59.803532 25006 slave.cpp:3733] Checkpointing TaskInfo to
'/tmp/SlaveRecoveryTest_0_ShutdownSlave_83ZbbJ/meta/slaves/20140606-183859-2272962752-40636-24987-0/frameworks/20140606-183859-2272962752-40636-24987-0000/executors/de849715-bebc-44f6-b743-b9936ba4c0a4/runs/6266edf5-4bbe-40a4-a09d-358472cb8041/tasks/de849715-bebc-44f6-b743-b9936ba4c0a4/task.info'
I0606 18:38:59.812248 25006 slave.cpp:1159] Queuing task
'de849715-bebc-44f6-b743-b9936ba4c0a4' for executor
de849715-bebc-44f6-b743-b9936ba4c0a4 of framework
'20140606-183859-2272962752-40636-24987-0000
I0606 18:38:59.812863 25006 slave.cpp:498] Successfully attached file
'/tmp/SlaveRecoveryTest_0_ShutdownSlave_83ZbbJ/slaves/20140606-183859-2272962752-40636-24987-0/frameworks/20140606-183859-2272962752-40636-24987-0000/executors/de849715-bebc-44f6-b743-b9936ba4c0a4/runs/6266edf5-4bbe-40a4-a09d-358472cb8041'
I0606 18:39:00.736346 25005 hierarchical_allocator_process.hpp:687] Performed
allocation for 1 slaves in 139472ns
I0606 18:39:00.737416 25010 slave.cpp:2361] Monitoring executor
'de849715-bebc-44f6-b743-b9936ba4c0a4' of framework
'20140606-183859-2272962752-40636-24987-0000' in container
'6266edf5-4bbe-40a4-a09d-358472cb8041'
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0606 18:39:00.826778 25411 process.cpp:1671] libprocess is initialized on
192.168.122.135:39003 for 8 cpus
I0606 18:39:00.829923 25411 exec.cpp:131] Version: 0.20.0
I0606 18:39:00.832566 25449 exec.cpp:181] Executor started at:
executor(1)@192.168.122.135:39003 with pid 25411
I0606 18:39:00.833638 25010 slave.cpp:1670] Got registration for executor
'de849715-bebc-44f6-b743-b9936ba4c0a4' of framework
20140606-183859-2272962752-40636-24987-0000
I0606 18:39:00.833713 25010 slave.cpp:1755] Checkpointing executor pid
'executor(1)@192.168.122.135:39003' to
'/tmp/SlaveRecoveryTest_0_ShutdownSlave_83ZbbJ/meta/slaves/20140606-183859-2272962752-40636-24987-0/frameworks/20140606-183859-2272962752-40636-24987-0000/executors/de849715-bebc-44f6-b743-b9936ba4c0a4/runs/6266edf5-4bbe-40a4-a09d-358472cb8041/pids/libprocess.pid'
I0606 18:39:00.834895 25010 slave.cpp:1789] Flushing queued task
de849715-bebc-44f6-b743-b9936ba4c0a4 for executor
'de849715-bebc-44f6-b743-b9936ba4c0a4' of framework
20140606-183859-2272962752-40636-24987-0000
I0606 18:39:00.835798 25447 exec.cpp:205] Executor registered on slave
20140606-183859-2272962752-40636-24987-0
I0606 18:39:00.836727 25447 exec.cpp:217] Executor::registered took 111641ns
Registered executor on fedora-19
I0606 18:39:00.837342 25013 process.cpp:1098] Socket closed while receiving
I0606 18:39:00.838347 25445 exec.cpp:292] Executor asked to run task
'de849715-bebc-44f6-b743-b9936ba4c0a4'
I0606 18:39:00.838433 25445 exec.cpp:301] Executor::launchTask took 69226ns
Starting task de849715-bebc-44f6-b743-b9936ba4c0a4
I0606 18:39:00.843193 25450 process.cpp:1037] Socket closed while receiving
sh -c 'sleep 1000'
Forked command at 25451
I0606 18:39:00.854920 25449 exec.cpp:524] Executor sending status update
TASK_RUNNING (UUID: 7012f4e1-15e6-44c2-ad42-61dd82bdd5ee) for task
de849715-bebc-44f6-b743-b9936ba4c0a4 of framework
20140606-183859-2272962752-40636-24987-0000
I0606 18:39:00.855803 25010 slave.cpp:2024] Handling status update TASK_RUNNING
(UUID: 7012f4e1-15e6-44c2-ad42-61dd82bdd5ee) for task
de849715-bebc-44f6-b743-b9936ba4c0a4 of framework
20140606-183859-2272962752-40636-24987-0000 from
executor(1)@192.168.122.135:39003
I0606 18:39:00.855929 25010 status_update_manager.cpp:320] Received status
update TASK_RUNNING (UUID: 7012f4e1-15e6-44c2-ad42-61dd82bdd5ee) for task
de849715-bebc-44f6-b743-b9936ba4c0a4 of framework
20140606-183859-2272962752-40636-24987-0000
I0606 18:39:00.855947 25010 status_update_manager.cpp:499] Creating
StatusUpdate stream for task de849715-bebc-44f6-b743-b9936ba4c0a4 of framework
20140606-183859-2272962752-40636-24987-0000
I0606 18:39:00.856501 25010 status_update_manager.hpp:342] Checkpointing UPDATE
for status update TASK_RUNNING (UUID: 7012f4e1-15e6-44c2-ad42-61dd82bdd5ee) for
task de849715-bebc-44f6-b743-b9936ba4c0a4 of framework
20140606-183859-2272962752-40636-24987-0000
I0606 18:39:00.856595 25010 status_update_manager.cpp:373] Forwarding status
update TASK_RUNNING (UUID: 7012f4e1-15e6-44c2-ad42-61dd82bdd5ee) for task
de849715-bebc-44f6-b743-b9936ba4c0a4 of framework
20140606-183859-2272962752-40636-24987-0000 to [email protected]:40636
I0606 18:39:00.856740 25010 master.cpp:2608] Status update TASK_RUNNING (UUID:
7012f4e1-15e6-44c2-ad42-61dd82bdd5ee) for task
de849715-bebc-44f6-b743-b9936ba4c0a4 of framework
20140606-183859-2272962752-40636-24987-0000 from slave
20140606-183859-2272962752-40636-24987-0 at slave(53)@192.168.122.135:40636
(fedora-19)
I0606 18:39:00.856777 25010 slave.cpp:2145] Status update manager successfully
handled status update TASK_RUNNING (UUID: 7012f4e1-15e6-44c2-ad42-61dd82bdd5ee)
for task de849715-bebc-44f6-b743-b9936ba4c0a4 of framework
20140606-183859-2272962752-40636-24987-0000
I0606 18:39:00.856789 25010 slave.cpp:2151] Sending acknowledgement for status
update TASK_RUNNING (UUID: 7012f4e1-15e6-44c2-ad42-61dd82bdd5ee) for task
de849715-bebc-44f6-b743-b9936ba4c0a4 of framework
20140606-183859-2272962752-40636-24987-0000 to executor(1)@192.168.122.135:39003
I0606 18:39:00.857117 25005 sched.cpp:626] Scheduler::statusUpdate took 46862ns
I0606 18:39:00.857756 25444 exec.cpp:338] Executor received status update
acknowledgement 7012f4e1-15e6-44c2-ad42-61dd82bdd5ee for task
de849715-bebc-44f6-b743-b9936ba4c0a4 of framework
20140606-183859-2272962752-40636-24987-0000
I0606 18:39:00.858286 25445 exec.cpp:378] Executor asked to shutdown
I0606 18:39:00.858337 25445 exec.cpp:393] Executor::shutdown took 9726ns
I0606 18:39:00.858360 25445 exec.cpp:77] Scheduling shutdown of the executor
Shutting down
Sending SIGTERM to process tree at pid 25451
I0606 18:39:00.862169 25005 hierarchical_allocator_process.hpp:687] Performed
allocation for 1 slaves in 41161ns
I0606 18:39:00.866359 25013 process.cpp:1098] Socket closed while receiving
I0606 18:39:00.867178 25010 status_update_manager.cpp:398] Received status
update acknowledgement (UUID: 7012f4e1-15e6-44c2-ad42-61dd82bdd5ee) for task
de849715-bebc-44f6-b743-b9936ba4c0a4 of framework
20140606-183859-2272962752-40636-24987-0000
I0606 18:39:00.867209 25010 status_update_manager.hpp:342] Checkpointing ACK
for status update TASK_RUNNING (UUID: 7012f4e1-15e6-44c2-ad42-61dd82bdd5ee) for
task de849715-bebc-44f6-b743-b9936ba4c0a4 of framework
20140606-183859-2272962752-40636-24987-0000
I0606 18:39:00.867286 25010 slave.cpp:1610] Status update manager successfully
handled status update acknowledgement (UUID:
7012f4e1-15e6-44c2-ad42-61dd82bdd5ee) for task
de849715-bebc-44f6-b743-b9936ba4c0a4 of framework
20140606-183859-2272962752-40636-24987-0000
I0606 18:39:00.867454 25450 process.cpp:1037] Socket closed while receiving
Killing the following process trees:
[
--- 25451 sleep 1000
]
Command terminated with signal Terminated (pid: 25451)
I0606 18:39:01.856297 25444 exec.cpp:524] Executor sending status update
TASK_KILLED (UUID: 9ee11dba-1cd6-496f-947f-0054329a6141) for task
de849715-bebc-44f6-b743-b9936ba4c0a4 of framework
20140606-183859-2272962752-40636-24987-0000
I0606 18:39:01.857009 25006 slave.cpp:2024] Handling status update TASK_KILLED
(UUID: 9ee11dba-1cd6-496f-947f-0054329a6141) for task
de849715-bebc-44f6-b743-b9936ba4c0a4 of framework
20140606-183859-2272962752-40636-24987-0000 from
executor(1)@192.168.122.135:39003
I0606 18:39:01.857255 25006 slave.cpp:3666] Terminating task
de849715-bebc-44f6-b743-b9936ba4c0a4
I0606 18:39:01.857662 25006 status_update_manager.cpp:320] Received status
update TASK_KILLED (UUID: 9ee11dba-1cd6-496f-947f-0054329a6141) for task
de849715-bebc-44f6-b743-b9936ba4c0a4 of framework
20140606-183859-2272962752-40636-24987-0000
I0606 18:39:01.857693 25006 status_update_manager.hpp:342] Checkpointing UPDATE
for status update TASK_KILLED (UUID: 9ee11dba-1cd6-496f-947f-0054329a6141) for
task de849715-bebc-44f6-b743-b9936ba4c0a4 of framework
20140606-183859-2272962752-40636-24987-0000
I0606 18:39:01.857761 25006 status_update_manager.cpp:373] Forwarding status
update TASK_KILLED (UUID: 9ee11dba-1cd6-496f-947f-0054329a6141) for task
de849715-bebc-44f6-b743-b9936ba4c0a4 of framework
20140606-183859-2272962752-40636-24987-0000 to [email protected]:40636
I0606 18:39:01.857985 25006 master.cpp:2608] Status update TASK_KILLED (UUID:
9ee11dba-1cd6-496f-947f-0054329a6141) for task
de849715-bebc-44f6-b743-b9936ba4c0a4 of framework
20140606-183859-2272962752-40636-24987-0000 from slave
20140606-183859-2272962752-40636-24987-0 at slave(53)@192.168.122.135:40636
(fedora-19)
I0606 18:39:01.858161 25010 slave.cpp:2145] Status update manager successfully
handled status update TASK_KILLED (UUID: 9ee11dba-1cd6-496f-947f-0054329a6141)
for task de849715-bebc-44f6-b743-b9936ba4c0a4 of framework
20140606-183859-2272962752-40636-24987-0000
I0606 18:39:01.858270 25010 slave.cpp:2151] Sending acknowledgement for status
update TASK_KILLED (UUID: 9ee11dba-1cd6-496f-947f-0054329a6141) for task
de849715-bebc-44f6-b743-b9936ba4c0a4 of framework
20140606-183859-2272962752-40636-24987-0000 to executor(1)@192.168.122.135:39003
I0606 18:39:01.858680 25008 sched.cpp:626] Scheduler::statusUpdate took 34644ns
I0606 18:39:01.859246 25010 status_update_manager.cpp:398] Received status
update acknowledgement (UUID: 9ee11dba-1cd6-496f-947f-0054329a6141) for task
de849715-bebc-44f6-b743-b9936ba4c0a4 of framework
20140606-183859-2272962752-40636-24987-0000
I0606 18:39:01.859273 25010 status_update_manager.hpp:342] Checkpointing ACK
for status update TASK_KILLED (UUID: 9ee11dba-1cd6-496f-947f-0054329a6141) for
task de849715-bebc-44f6-b743-b9936ba4c0a4 of framework
20140606-183859-2272962752-40636-24987-0000
I0606 18:39:01.859314 25010 status_update_manager.cpp:530] Cleaning up status
update stream for task de849715-bebc-44f6-b743-b9936ba4c0a4 of framework
20140606-183859-2272962752-40636-24987-0000
I0606 18:39:01.859410 25010 slave.cpp:1610] Status update manager successfully
handled status update acknowledgement (UUID:
9ee11dba-1cd6-496f-947f-0054329a6141) for task
de849715-bebc-44f6-b743-b9936ba4c0a4 of framework
20140606-183859-2272962752-40636-24987-0000
I0606 18:39:01.859427 25010 slave.cpp:3708] Completing task
de849715-bebc-44f6-b743-b9936ba4c0a4
I0606 18:39:01.859716 25444 exec.cpp:331] Ignoring status update
acknowledgement 9ee11dba-1cd6-496f-947f-0054329a6141 for task
de849715-bebc-44f6-b743-b9936ba4c0a4 of framework
20140606-183859-2272962752-40636-24987-0000 because the driver is aborted!
I0606 18:39:01.859803 25013 process.cpp:1098] Socket closed while receiving
I0606 18:39:01.859869 25450 process.cpp:1037] Socket closed while receiving
I0606 18:39:01.860499 25006 master.hpp:683] Removing task
de849715-bebc-44f6-b743-b9936ba4c0a4 with resources cpus(*):2; mem(*):1024;
disk(*):1024; ports(*):[31000-32000] on slave
20140606-183859-2272962752-40636-24987-0 (fedora-19)
I0606 18:39:01.860682 25006 hierarchical_allocator_process.hpp:636] Recovered
cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] (total
allocatable: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000]) on
slave 20140606-183859-2272962752-40636-24987-0 from framework
20140606-183859-2272962752-40636-24987-0000
2014-06-06
18:39:01,948:24987(0x7f4e115ce700):ZOO_ERROR@handle_socket_error_msg@1697:
Socket [127.0.0.1:60899] zk retcode=-4, errno=111(Connection refused): server
refused to accept the client
I0606 18:39:02.857594 25013 process.cpp:1037] Socket closed while receiving
2014-06-06
18:39:05,284:24987(0x7f4e115ce700):ZOO_ERROR@handle_socket_error_msg@1697:
Socket [127.0.0.1:60899] zk retcode=-4, errno=111(Connection refused): server
refused to accept the client
2014-06-06
18:39:08,620:24987(0x7f4e115ce700):ZOO_ERROR@handle_socket_error_msg@1697:
Socket [127.0.0.1:60899] zk retcode=-4, errno=111(Connection refused): server
refused to accept the client
2014-06-06
18:39:11,956:24987(0x7f4e115ce700):ZOO_ERROR@handle_socket_error_msg@1697:
Socket [127.0.0.1:60899] zk retcode=-4, errno=111(Connection refused): server
refused to accept the client
2014-06-06
18:39:15,292:24987(0x7f4e115ce700):ZOO_ERROR@handle_socket_error_msg@1697:
Socket [127.0.0.1:60899] zk retcode=-4, errno=111(Connection refused): server
refused to accept the client
2014-06-06
18:39:18,628:24987(0x7f4e115ce700):ZOO_ERROR@handle_socket_error_msg@1697:
Socket [127.0.0.1:60899] zk retcode=-4, errno=111(Connection refused): server
refused to accept the client
2014-06-06
18:39:21,965:24987(0x7f4e115ce700):ZOO_ERROR@handle_socket_error_msg@1697:
Socket [127.0.0.1:60899] zk retcode=-4, errno=111(Connection refused): server
refused to accept the client
2014-06-06
18:39:25,301:24987(0x7f4e115ce700):ZOO_ERROR@handle_socket_error_msg@1697:
Socket [127.0.0.1:60899] zk retcode=-4, errno=111(Connection refused): server
refused to accept the client
2014-06-06
18:39:28,637:24987(0x7f4e115ce700):ZOO_ERROR@handle_socket_error_msg@1697:
Socket [127.0.0.1:60899] zk retcode=-4, errno=111(Connection refused): server
refused to accept the client
2014-06-06
18:39:31,973:24987(0x7f4e115ce700):ZOO_ERROR@handle_socket_error_msg@1697:
Socket [127.0.0.1:60899] zk retcode=-4, errno=111(Connection refused): server
refused to accept the client
{noformat}
--
This message was sent by Atlassian JIRA
(v6.2#6252)