[ 
https://issues.apache.org/jira/browse/MESOS-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998238#comment-13998238
 ] 

Ian Downes commented on MESOS-1370:
-----------------------------------

Currently running code from https://reviews.apache.org/r/20958/ rebased on 
master which makes all code in the forked child async safe. 

At iteration 80 of just this test and no failure so far (was failing within 10 
iterations before).

> SlaveRecoveryTest/0.RemoveNonCheckpointingFramework is flaky
> ------------------------------------------------------------
>
>                 Key: MESOS-1370
>                 URL: https://issues.apache.org/jira/browse/MESOS-1370
>             Project: Mesos
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 0.19.0
>            Reporter: Vinod Kone
>
> [ RUN      ] SlaveRecoveryTest/0.RemoveNonCheckpointingFramework
> Using temporary directory 
> '/tmp/SlaveRecoveryTest_0_RemoveNonCheckpointingFramework_ctBWwE'
> I0514 20:45:43.750376 36778 leveldb.cpp:176] Opened db in 116.754848ms
> I0514 20:45:43.766901 36778 leveldb.cpp:183] Compacted db in 16.431296ms
> I0514 20:45:43.767030 36778 leveldb.cpp:198] Created db iterator in 18645ns
> I0514 20:45:43.767072 36778 leveldb.cpp:204] Seeked to beginning of db in 
> 1665ns
> I0514 20:45:43.767089 36778 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 342ns
> I0514 20:45:43.767129 36778 replica.cpp:741] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0514 20:45:43.767776 36801 recover.cpp:425] Starting replica recovery
> I0514 20:45:43.768074 36801 recover.cpp:451] Replica is in EMPTY status
> I0514 20:45:43.770112 36793 replica.cpp:638] Replica in EMPTY status received 
> a broadcasted recover request
> I0514 20:45:43.770190 36804 master.cpp:267] Master 
> 20140514-204543-1828659978-57356-36778 (smfd-atr-11-sr1.devel.twitter.com) 
> started on 10.35.255.108:57356
> I0514 20:45:43.770254 36804 master.cpp:304] Master only allowing 
> authenticated frameworks to register
> I0514 20:45:43.770282 36804 master.cpp:309] Master only allowing 
> authenticated slaves to register
> I0514 20:45:43.770301 36804 credentials.hpp:35] Loading credentials for 
> authentication
> W0514 20:45:43.770616 36804 credentials.hpp:48] Failed to stat credentials 
> file 
> 'file:///tmp/SlaveRecoveryTest_0_RemoveNonCheckpointingFramework_ctBWwE/credentials':
>  No such file or directory
> I0514 20:45:43.770705 36806 recover.cpp:188] Received a recover response from 
> a replica in EMPTY status
> I0514 20:45:43.771790 36808 recover.cpp:542] Updating replica status to 
> STARTING
> I0514 20:45:43.772866 36801 master.cpp:919] The newly elected leader is 
> master@10.35.255.108:57356 with id 20140514-204543-1828659978-57356-36778
> I0514 20:45:43.772904 36801 master.cpp:929] Elected as the leading master!
> I0514 20:45:43.772924 36801 master.cpp:750] Recovering from registrar
> I0514 20:45:43.773144 36802 registrar.cpp:313] Recovering registrar
> I0514 20:45:43.825598 36813 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 53.288306ms
> I0514 20:45:43.825683 36813 replica.cpp:320] Persisted replica status to 
> STARTING
> I0514 20:45:43.825914 36813 recover.cpp:451] Replica is in STARTING status
> I0514 20:45:43.827308 36807 replica.cpp:638] Replica in STARTING status 
> received a broadcasted recover request
> I0514 20:45:43.827898 36814 recover.cpp:188] Received a recover response from 
> a replica in STARTING status
> I0514 20:45:43.828277 36812 recover.cpp:542] Updating replica status to VOTING
> I0514 20:45:43.842594 36810 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 14.148097ms
> I0514 20:45:43.842643 36810 replica.cpp:320] Persisted replica status to 
> VOTING
> I0514 20:45:43.842743 36805 recover.cpp:556] Successfully joined the Paxos 
> group
> I0514 20:45:43.843027 36805 recover.cpp:440] Recover process terminated
> I0514 20:45:43.843966 36797 log.cpp:656] Attempting to start the writer
> I0514 20:45:43.845717 36794 replica.cpp:474] Replica received implicit 
> promise request with proposal 1
> I0514 20:45:43.850947 36794 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 5.159061ms
> I0514 20:45:43.851055 36794 replica.cpp:342] Persisted promised to 1
> I0514 20:45:43.851780 36801 coordinator.cpp:230] Coordinator attemping to 
> fill missing position
> I0514 20:45:43.853118 36800 replica.cpp:375] Replica received explicit 
> promise request for position 0 with proposal 2
> I0514 20:45:43.859256 36800 leveldb.cpp:343] Persisting action (8 bytes) to 
> leveldb took 6.092406ms
> I0514 20:45:43.859344 36800 replica.cpp:676] Persisted action at 0
> I0514 20:45:43.860616 36808 replica.cpp:508] Replica received write request 
> for position 0
> I0514 20:45:43.860724 36808 leveldb.cpp:438] Reading position from leveldb 
> took 70475ns
> I0514 20:45:43.867585 36808 leveldb.cpp:343] Persisting action (14 bytes) to 
> leveldb took 6.825758ms
> I0514 20:45:43.867629 36808 replica.cpp:676] Persisted action at 0
> I0514 20:45:43.868314 36799 replica.cpp:655] Replica received learned notice 
> for position 0
> I0514 20:45:43.875905 36799 leveldb.cpp:343] Persisting action (16 bytes) to 
> leveldb took 7.547621ms
> I0514 20:45:43.876014 36799 replica.cpp:676] Persisted action at 0
> I0514 20:45:43.876044 36799 replica.cpp:661] Replica learned NOP action at 
> position 0
> I0514 20:45:43.876915 36798 log.cpp:672] Writer started with ending position 0
> I0514 20:45:43.878356 36794 leveldb.cpp:438] Reading position from leveldb 
> took 32642ns
> I0514 20:45:43.881211 36806 registrar.cpp:346] Successfully fetched the 
> registry (0B)
> I0514 20:45:43.881439 36806 registrar.cpp:422] Attempting to update the 
> 'registry'
> I0514 20:45:43.883990 36808 log.cpp:680] Attempting to append 155 bytes to 
> the log
> I0514 20:45:43.884259 36801 coordinator.cpp:340] Coordinator attempting to 
> write APPEND action at position 1
> I0514 20:45:43.885217 36802 replica.cpp:508] Replica received write request 
> for position 1
> I0514 20:45:43.892568 36802 leveldb.cpp:343] Persisting action (174 bytes) to 
> leveldb took 7.301141ms
> I0514 20:45:43.892647 36802 replica.cpp:676] Persisted action at 1
> I0514 20:45:43.893304 36798 replica.cpp:655] Replica received learned notice 
> for position 1
> I0514 20:45:43.900894 36798 leveldb.cpp:343] Persisting action (176 bytes) to 
> leveldb took 7.549491ms
> I0514 20:45:43.900982 36798 replica.cpp:676] Persisted action at 1
> I0514 20:45:43.901008 36798 replica.cpp:661] Replica learned APPEND action at 
> position 1
> I0514 20:45:43.902535 36812 registrar.cpp:479] Successfully updated 'registry'
> I0514 20:45:43.902695 36812 registrar.cpp:372] Successfully recovered 
> registrar
> I0514 20:45:43.902731 36816 log.cpp:699] Attempting to truncate the log to 1
> I0514 20:45:43.902936 36804 coordinator.cpp:340] Coordinator attempting to 
> write TRUNCATE action at position 2
> I0514 20:45:43.902962 36805 master.cpp:777] Recovered 0 slaves from the 
> Registry (117B) ; allowing 10mins for slaves to re-register
> I0514 20:45:43.904026 36799 replica.cpp:508] Replica received write request 
> for position 2
> I0514 20:45:43.905686 36778 mesos_containerizer.cpp:122] Using isolation: 
> cgroups/cpu,cgroups/mem
> I0514 20:45:43.909281 36799 leveldb.cpp:343] Persisting action (16 bytes) to 
> leveldb took 5.203581ms
> I0514 20:45:43.909320 36799 replica.cpp:676] Persisted action at 2
> I0514 20:45:43.909922 36795 replica.cpp:655] Replica received learned notice 
> for position 2
> I0514 20:45:43.917597 36795 leveldb.cpp:343] Persisting action (18 bytes) to 
> leveldb took 7.625126ms
> I0514 20:45:43.917716 36795 leveldb.cpp:401] Deleting ~1 keys from leveldb 
> took 25746ns
> I0514 20:45:43.917744 36795 replica.cpp:676] Persisted action at 2
> I0514 20:45:43.917799 36795 replica.cpp:661] Replica learned TRUNCATE action 
> at position 2
> I0514 20:45:43.956254 36778 linux_launcher.cpp:66] Using 
> /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher
> I0514 20:45:43.960209 36793 slave.cpp:143] Slave started on 
> 14)@10.35.255.108:57356
> I0514 20:45:43.960284 36793 slave.cpp:152] Moving slave process into its own 
> cgroup
> I0514 20:45:43.963740 36778 sched.cpp:121] Version: 0.19.0
> I0514 20:45:43.964298 36800 sched.cpp:217] New master detected at 
> master@10.35.255.108:57356
> I0514 20:45:43.964359 36800 sched.cpp:268] Authenticating with master 
> master@10.35.255.108:57356
> I0514 20:45:43.964944 36811 authenticatee.hpp:128] Creating new client SASL 
> connection
> I0514 20:45:43.965106 36811 master.cpp:2803] Authenticating 
> scheduler(8)@10.35.255.108:57356
> I0514 20:45:43.965680 36794 authenticator.hpp:148] Creating new server SASL 
> connection
> I0514 20:45:43.966018 36794 authenticatee.hpp:219] Received SASL 
> authentication mechanisms: CRAM-MD5
> I0514 20:45:43.966141 36794 authenticatee.hpp:245] Attempting to authenticate 
> with mechanism 'CRAM-MD5'
> I0514 20:45:43.966218 36794 authenticator.hpp:254] Received SASL 
> authentication start
> I0514 20:45:43.966291 36794 authenticator.hpp:342] Authentication requires 
> more steps
> I0514 20:45:43.966361 36794 authenticatee.hpp:265] Received SASL 
> authentication step
> I0514 20:45:43.966521 36794 authenticator.hpp:282] Received SASL 
> authentication step
> I0514 20:45:43.966642 36794 authenticator.hpp:334] Authentication success
> I0514 20:45:43.966730 36810 master.cpp:2843] Successfully authenticated 
> scheduler(8)@10.35.255.108:57356
> I0514 20:45:43.966733 36805 authenticatee.hpp:305] Authentication success
> I0514 20:45:43.967176 36805 sched.cpp:342] Successfully authenticated with 
> master master@10.35.255.108:57356
> I0514 20:45:43.967331 36814 master.cpp:978] Received registration request 
> from scheduler(8)@10.35.255.108:57356
> I0514 20:45:43.967427 36814 master.cpp:996] Registering framework 
> 20140514-204543-1828659978-57356-36778-0000 at 
> scheduler(8)@10.35.255.108:57356
> I0514 20:45:43.967639 36806 sched.cpp:392] Framework registered with 
> 20140514-204543-1828659978-57356-36778-0000
> I0514 20:45:43.967727 36801 hierarchical_allocator_process.hpp:331] Added 
> framework 20140514-204543-1828659978-57356-36778-0000
> I0514 20:45:43.972489 36793 slave.cpp:152] Moving slave process into its own 
> cgroup
> I0514 20:45:43.978257 36793 credentials.hpp:35] Loading credentials for 
> authentication
> W0514 20:45:43.978394 36793 credentials.hpp:48] Failed to stat credentials 
> file 
> 'file:///tmp/SlaveRecoveryTest_0_RemoveNonCheckpointingFramework_CN6t3w/credential':
>  No such file or directory
> I0514 20:45:43.978446 36793 slave.cpp:239] Slave using credential for: 
> test-principal
> I0514 20:45:43.978675 36793 slave.cpp:252] Slave resources: cpus(*):2; 
> mem(*):1024; disk(*):1024; ports(*):[31000-32000]
> I0514 20:45:43.979122 36793 slave.cpp:280] Slave hostname: 
> smfd-atr-11-sr1.devel.twitter.com
> I0514 20:45:43.979161 36793 slave.cpp:281] Slave checkpoint: true
> I0514 20:45:43.980404 36798 state.cpp:33] Recovering state from 
> '/tmp/SlaveRecoveryTest_0_RemoveNonCheckpointingFramework_CN6t3w/meta'
> I0514 20:45:43.980890 36803 status_update_manager.cpp:193] Recovering status 
> update manager
> I0514 20:45:43.981730 36806 mesos_containerizer.cpp:279] Recovering 
> containerizer
> I0514 20:45:43.983495 36795 mem.cpp:165] Removing orphaned cgroup 
> 'mesos_test_d6f332f1-3598-460e-a74e-5749a7bf7ac6/slave'
> I0514 20:45:43.984102 36814 cpushare.cpp:189] Removing orphaned cgroup 
> 'cpuacct/mesos_test_d6f332f1-3598-460e-a74e-5749a7bf7ac6/slave'
> I0514 20:45:43.987674 36801 slave.cpp:2975] Finished recovery
> I0514 20:45:43.988082 36807 slave.cpp:533] New master detected at 
> master@10.35.255.108:57356
> I0514 20:45:43.988185 36807 slave.cpp:575] Detecting new master
> I0514 20:45:43.988185 36803 status_update_manager.cpp:167] New master 
> detected at master@10.35.255.108:57356
> I0514 20:45:43.993654 36803 slave.cpp:602] Authenticating with master 
> master@10.35.255.108:57356
> I0514 20:45:43.993875 36795 authenticatee.hpp:128] Creating new client SASL 
> connection
> I0514 20:45:43.994104 36799 master.cpp:2803] Authenticating 
> slave(14)@10.35.255.108:57356
> I0514 20:45:43.994382 36800 authenticator.hpp:148] Creating new server SASL 
> connection
> I0514 20:45:43.994673 36809 authenticatee.hpp:219] Received SASL 
> authentication mechanisms: CRAM-MD5
> I0514 20:45:43.994724 36809 authenticatee.hpp:245] Attempting to authenticate 
> with mechanism 'CRAM-MD5'
> I0514 20:45:43.994840 36796 authenticator.hpp:254] Received SASL 
> authentication start
> I0514 20:45:43.994940 36796 authenticator.hpp:342] Authentication requires 
> more steps
> I0514 20:45:43.995050 36796 authenticatee.hpp:265] Received SASL 
> authentication step
> I0514 20:45:43.995323 36813 authenticator.hpp:282] Received SASL 
> authentication step
> I0514 20:45:43.995424 36813 authenticator.hpp:334] Authentication success
> I0514 20:45:43.995553 36805 master.cpp:2843] Successfully authenticated 
> slave(14)@10.35.255.108:57356
> I0514 20:45:43.995612 36793 authenticatee.hpp:305] Authentication success
> I0514 20:45:43.996165 36797 slave.cpp:659] Successfully authenticated with 
> master master@10.35.255.108:57356
> I0514 20:45:43.996450 36811 master.cpp:2134] Registering slave at 
> slave(14)@10.35.255.108:57356 (smfd-atr-11-sr1.devel.twitter.com) with id 
> 20140514-204543-1828659978-57356-36778-0
> I0514 20:45:43.996803 36802 registrar.cpp:422] Attempting to update the 
> 'registry'
> I0514 20:45:43.999317 36793 log.cpp:680] Attempting to append 382 bytes to 
> the log
> I0514 20:45:43.999510 36802 coordinator.cpp:340] Coordinator attempting to 
> write APPEND action at position 3
> I0514 20:45:44.000373 36801 replica.cpp:508] Replica received write request 
> for position 3
> I0514 20:45:44.013892 36812 master.cpp:2122] Ignoring register slave message 
> from slave(14)@10.35.255.108:57356 (smfd-atr-11-sr1.devel.twitter.com) as 
> admission is already in progress
> I0514 20:45:44.017613 36801 leveldb.cpp:343] Persisting action (401 bytes) to 
> leveldb took 17.134038ms
> I0514 20:45:44.017695 36801 replica.cpp:676] Persisted action at 3
> I0514 20:45:44.018345 36797 replica.cpp:655] Replica received learned notice 
> for position 3
> I0514 20:45:44.028404 36809 master.cpp:2122] Ignoring register slave message 
> from slave(14)@10.35.255.108:57356 (smfd-atr-11-sr1.devel.twitter.com) as 
> admission is already in progress
> I0514 20:45:44.067612 36797 leveldb.cpp:343] Persisting action (403 bytes) to 
> leveldb took 49.185909ms
> I0514 20:45:44.067726 36797 replica.cpp:676] Persisted action at 3
> I0514 20:45:44.067757 36797 replica.cpp:661] Replica learned APPEND action at 
> position 3
> I0514 20:45:44.068979 36803 master.cpp:2122] Ignoring register slave message 
> from slave(14)@10.35.255.108:57356 (smfd-atr-11-sr1.devel.twitter.com) as 
> admission is already in progress
> I0514 20:45:44.069107 36811 registrar.cpp:479] Successfully updated 'registry'
> I0514 20:45:44.069480 36806 log.cpp:699] Attempting to truncate the log to 3
> I0514 20:45:44.069710 36803 coordinator.cpp:340] Coordinator attempting to 
> write TRUNCATE action at position 4
> I0514 20:45:44.069906 36815 master.cpp:2174] Registered slave 
> 20140514-204543-1828659978-57356-36778-0 at slave(14)@10.35.255.108:57356 
> (smfd-atr-11-sr1.devel.twitter.com)
> I0514 20:45:44.069944 36815 master.cpp:3288] Adding slave 
> 20140514-204543-1828659978-57356-36778-0 at slave(14)@10.35.255.108:57356 
> (smfd-atr-11-sr1.devel.twitter.com) with cpus(*):2; mem(*):1024; 
> disk(*):1024; ports(*):[31000-32000]
> I0514 20:45:44.070243 36802 slave.cpp:693] Registered with master 
> master@10.35.255.108:57356; given slave ID 
> 20140514-204543-1828659978-57356-36778-0
> I0514 20:45:44.070466 36802 slave.cpp:706] Checkpointing SlaveInfo to 
> '/tmp/SlaveRecoveryTest_0_RemoveNonCheckpointingFramework_CN6t3w/meta/slaves/20140514-204543-1828659978-57356-36778-0/slave.info'
> I0514 20:45:44.070592 36807 replica.cpp:508] Replica received write request 
> for position 4
> I0514 20:45:44.070713 36793 hierarchical_allocator_process.hpp:444] Added 
> slave 20140514-204543-1828659978-57356-36778-0 
> (smfd-atr-11-sr1.devel.twitter.com) with cpus(*):2; mem(*):1024; 
> disk(*):1024; ports(*):[31000-32000] (and cpus(*):2; mem(*):1024; 
> disk(*):1024; ports(*):[31000-32000] available)
> I0514 20:45:44.071257 36793 master.cpp:2752] Sending 1 offers to framework 
> 20140514-204543-1828659978-57356-36778-0000
> W0514 20:45:44.076079 36794 sched.cpp:902] Attempting to launch task 
> 211518ed-53dd-446e-991d-d40d1b78df72 with an unknown offer 
> 20140514-204543-1828659978-57356-36778-0
> I0514 20:45:44.076701 36793 master.cpp:1810] Processing reply for offers: [ 
> 20140514-204543-1828659978-57356-36778-0 ] on slave 
> 20140514-204543-1828659978-57356-36778-0 at slave(14)@10.35.255.108:57356 
> (smfd-atr-11-sr1.devel.twitter.com) for framework 
> 20140514-204543-1828659978-57356-36778-0000
> I0514 20:45:44.076871 36793 master.hpp:608] Adding task 
> 8f43e6d0-6f87-491a-84b6-83c103d118ec with resources cpus(*):1; mem(*):512 on 
> slave 20140514-204543-1828659978-57356-36778-0 
> (smfd-atr-11-sr1.devel.twitter.com)
> I0514 20:45:44.076935 36793 master.cpp:2927] Launching task 
> 8f43e6d0-6f87-491a-84b6-83c103d118ec of framework 
> 20140514-204543-1828659978-57356-36778-0000 with resources cpus(*):1; 
> mem(*):512 on slave 20140514-204543-1828659978-57356-36778-0 at 
> slave(14)@10.35.255.108:57356 (smfd-atr-11-sr1.devel.twitter.com)
> I0514 20:45:44.077231 36793 master.hpp:608] Adding task 
> 211518ed-53dd-446e-991d-d40d1b78df72 with resources cpus(*):1; mem(*):512 on 
> slave 20140514-204543-1828659978-57356-36778-0 
> (smfd-atr-11-sr1.devel.twitter.com)
> I0514 20:45:44.077265 36794 slave.cpp:920] Got assigned task 
> 8f43e6d0-6f87-491a-84b6-83c103d118ec for framework 
> 20140514-204543-1828659978-57356-36778-0000
> I0514 20:45:44.077301 36793 master.cpp:2927] Launching task 
> 211518ed-53dd-446e-991d-d40d1b78df72 of framework 
> 20140514-204543-1828659978-57356-36778-0000 with resources cpus(*):1; 
> mem(*):512 on slave 20140514-204543-1828659978-57356-36778-0 at 
> slave(14)@10.35.255.108:57356 (smfd-atr-11-sr1.devel.twitter.com)
> I0514 20:45:44.077832 36800 hierarchical_allocator_process.hpp:589] Framework 
> 20140514-204543-1828659978-57356-36778-0000 filtered slave 
> 20140514-204543-1828659978-57356-36778-0 for 5secs
> I0514 20:45:44.077859 36794 slave.cpp:920] Got assigned task 
> 211518ed-53dd-446e-991d-d40d1b78df72 for framework 
> 20140514-204543-1828659978-57356-36778-0000
> I0514 20:45:44.078202 36794 slave.cpp:1030] Launching task 
> 8f43e6d0-6f87-491a-84b6-83c103d118ec for framework 
> 20140514-204543-1828659978-57356-36778-0000
> I0514 20:45:44.080832 36813 mesos_containerizer.cpp:523] Starting container 
> '62489f61-2c47-44be-8cee-7f2dd6079908' for executor 
> '8f43e6d0-6f87-491a-84b6-83c103d118ec' of framework 
> '20140514-204543-1828659978-57356-36778-0000'
> I0514 20:45:44.080852 36794 slave.cpp:1140] Queuing task 
> '8f43e6d0-6f87-491a-84b6-83c103d118ec' for executor 
> 8f43e6d0-6f87-491a-84b6-83c103d118ec of framework 
> '20140514-204543-1828659978-57356-36778-0000
> I0514 20:45:44.080966 36794 slave.cpp:1030] Launching task 
> 211518ed-53dd-446e-991d-d40d1b78df72 for framework 
> 20140514-204543-1828659978-57356-36778-0000
> I0514 20:45:44.083236 36815 mem.cpp:413] Started listening for OOM events for 
> container 62489f61-2c47-44be-8cee-7f2dd6079908
> I0514 20:45:44.083644 36798 mesos_containerizer.cpp:523] Starting container 
> '10130c74-7ab3-462e-9206-576c568294c9' for executor 
> '211518ed-53dd-446e-991d-d40d1b78df72' of framework 
> '20140514-204543-1828659978-57356-36778-0000'
> I0514 20:45:44.083695 36794 slave.cpp:1140] Queuing task 
> '211518ed-53dd-446e-991d-d40d1b78df72' for executor 
> 211518ed-53dd-446e-991d-d40d1b78df72 of framework 
> '20140514-204543-1828659978-57356-36778-0000
> I0514 20:45:44.083884 36815 mem.cpp:277] Updated 'memory.soft_limit_in_bytes' 
> to 512MB for container 62489f61-2c47-44be-8cee-7f2dd6079908
> I0514 20:45:44.084671 36804 cpushare.cpp:334] Updated 'cpu.shares' to 1024 
> (cpus 1) for container 62489f61-2c47-44be-8cee-7f2dd6079908
> I0514 20:45:44.085093 36815 mem.cpp:307] Updated 'memory.limit_in_bytes' to 
> 512MB for container 62489f61-2c47-44be-8cee-7f2dd6079908
> I0514 20:45:44.087297 36813 linux_launcher.cpp:222] Cloning child process 
> with flags = 0
> I0514 20:45:44.088526 36815 mem.cpp:413] Started listening for OOM events for 
> container 10130c74-7ab3-462e-9206-576c568294c9
> I0514 20:45:44.092516 36807 leveldb.cpp:343] Persisting action (16 bytes) to 
> leveldb took 21.862391ms
> I0514 20:45:44.092689 36807 replica.cpp:676] Persisted action at 4
> I0514 20:45:44.093760 36816 replica.cpp:655] Replica received learned notice 
> for position 4
> I0514 20:45:44.100843 36816 leveldb.cpp:343] Persisting action (18 bytes) to 
> leveldb took 7.03076ms
> I0514 20:45:44.100983 36816 leveldb.cpp:401] Deleting ~2 keys from leveldb 
> took 34100ns
> I0514 20:45:44.101011 36816 replica.cpp:676] Persisted action at 4
> I0514 20:45:44.101039 36816 replica.cpp:661] Replica learned TRUNCATE action 
> at position 4
> I0514 20:45:44.135138 36815 mem.cpp:277] Updated 'memory.soft_limit_in_bytes' 
> to 512MB for container 10130c74-7ab3-462e-9206-576c568294c9
> I0514 20:45:44.136108 36804 cpushare.cpp:334] Updated 'cpu.shares' to 1024 
> (cpus 1) for container 10130c74-7ab3-462e-9206-576c568294c9
> I0514 20:45:44.136598 36815 mem.cpp:307] Updated 'memory.limit_in_bytes' to 
> 512MB for container 10130c74-7ab3-462e-9206-576c568294c9
> I0514 20:45:44.239186 36797 linux_launcher.cpp:222] Cloning child process 
> with flags = 0
> I0514 20:45:44.243119 36797 mesos_containerizer.cpp:623] Fetching URIs for 
> container '62489f61-2c47-44be-8cee-7f2dd6079908' using command 
> '/home/vinod/mesos/build/src/mesos-fetcher'
> I0514 20:45:44.259407 36802 mesos_containerizer.cpp:623] Fetching URIs for 
> container '10130c74-7ab3-462e-9206-576c568294c9' using command 
> '/home/vinod/mesos/build/src/mesos-fetcher'
> I0514 20:45:45.252585 36810 slave.cpp:2312] Monitoring executor 
> '8f43e6d0-6f87-491a-84b6-83c103d118ec' of framework 
> '20140514-204543-1828659978-57356-36778-0000' in container 
> '62489f61-2c47-44be-8cee-7f2dd6079908'
> I0514 20:45:45.253128 36810 slave.cpp:2312] Monitoring executor 
> '211518ed-53dd-446e-991d-d40d1b78df72' of framework 
> '20140514-204543-1828659978-57356-36778-0000' in container 
> '10130c74-7ab3-462e-9206-576c568294c9'
> WARNING: Logging before InitGoogleLogging() is written to STDERR
> I0514 20:45:45.303606 37377 exec.cpp:131] Version: 0.19.0
> I0514 20:45:45.310616 36810 slave.cpp:1620] Got registration for executor 
> '211518ed-53dd-446e-991d-d40d1b78df72' of framework 
> 20140514-204543-1828659978-57356-36778-0000
> I0514 20:45:45.311265 36810 slave.cpp:1739] Flushing queued task 
> 211518ed-53dd-446e-991d-d40d1b78df72 for executor 
> '211518ed-53dd-446e-991d-d40d1b78df72' of framework 
> 20140514-204543-1828659978-57356-36778-0000
> I0514 20:45:45.312124 37438 exec.cpp:205] Executor registered on slave 
> 20140514-204543-1828659978-57356-36778-0
> I0514 20:45:45.312604 36800 mem.cpp:277] Updated 'memory.soft_limit_in_bytes' 
> to 512MB for container 10130c74-7ab3-462e-9206-576c568294c9
> I0514 20:45:45.312749 36813 cpushare.cpp:334] Updated 'cpu.shares' to 1024 
> (cpus 1) for container 10130c74-7ab3-462e-9206-576c568294c9
> Registered executor on smfd-atr-11-sr1.devel.twitter.com
> Starting task 211518ed-53dd-446e-991d-d40d1b78df72
> sh -c 'sleep 1000'
> Forked command at 37457
> I0514 20:45:45.319758 36800 slave.cpp:1975] Handling status update 
> TASK_RUNNING (UUID: b1910817-32f3-4ab6-8169-11a3c0ea3d7b) for task 
> 211518ed-53dd-446e-991d-d40d1b78df72 of framework 
> 20140514-204543-1828659978-57356-36778-0000 from 
> executor(1)@10.35.255.108:39534
> I0514 20:45:45.320253 36800 status_update_manager.cpp:320] Received status 
> update TASK_RUNNING (UUID: b1910817-32f3-4ab6-8169-11a3c0ea3d7b) for task 
> 211518ed-53dd-446e-991d-d40d1b78df72 of framework 
> 20140514-204543-1828659978-57356-36778-0000
> I0514 20:45:45.320729 36800 status_update_manager.cpp:373] Forwarding status 
> update TASK_RUNNING (UUID: b1910817-32f3-4ab6-8169-11a3c0ea3d7b) for task 
> 211518ed-53dd-446e-991d-d40d1b78df72 of framework 
> 20140514-204543-1828659978-57356-36778-0000 to master@10.35.255.108:57356
> I0514 20:45:45.321002 36794 slave.cpp:2102] Sending acknowledgement for 
> status update TASK_RUNNING (UUID: b1910817-32f3-4ab6-8169-11a3c0ea3d7b) for 
> task 211518ed-53dd-446e-991d-d40d1b78df72 of framework 
> 20140514-204543-1828659978-57356-36778-0000 to executor(1)@10.35.255.108:39534
> I0514 20:45:45.321107 36811 master.cpp:2452] Status update TASK_RUNNING 
> (UUID: b1910817-32f3-4ab6-8169-11a3c0ea3d7b) for task 
> 211518ed-53dd-446e-991d-d40d1b78df72 of framework 
> 20140514-204543-1828659978-57356-36778-0000 from slave 
> 20140514-204543-1828659978-57356-36778-0 at slave(14)@10.35.255.108:57356 
> (smfd-atr-11-sr1.devel.twitter.com)
> I0514 20:45:45.322562 36812 status_update_manager.cpp:398] Received status 
> update acknowledgement (UUID: b1910817-32f3-4ab6-8169-11a3c0ea3d7b) for task 
> 211518ed-53dd-446e-991d-d40d1b78df72 of framework 
> 20140514-204543-1828659978-57356-36778-0000
> ../../src/tests/slave_recovery_tests.cpp:1076: Failure
> Failed to wait 10secs for update2



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to