Benjamin Mahler created MESOS-533:
-------------------------------------
Summary: SlaveRecoveryTest/0.CleanupExecutor is flaky on Jenkins.
Key: MESOS-533
URL: https://issues.apache.org/jira/browse/MESOS-533
Project: Mesos
Issue Type: Bug
Reporter: Benjamin Mahler
Assignee: Vinod Kone
Looks like there is a bad Result.get() call lurking in the code!
[ RUN ] SlaveRecoveryTest/0.CleanupExecutor
I0627 23:10:37.469480 11637 master.cpp:228] Master started on
67.195.138.61:42260
I0627 23:10:37.469604 11637 master.cpp:243] Master ID:
201306272310-1032504131-42260-11206
I0627 23:10:37.470226 11634 hierarchical_allocator_process.hpp:295]
Initializing hierarchical allocator process with master :
[email protected]:42260
W0627 23:10:37.470305 11632 master.cpp:83] No whitelist given. Advertising
offers for all slaves
I0627 23:10:37.470399 11637 master.cpp:526] Elected as master!
I0627 23:10:37.486925 11633 slave.cpp:112] Slave started on
22)@67.195.138.61:42260
I0627 23:10:37.487216 11633 slave.cpp:204] Slave resources: cpus=2; mem=1024;
ports=[31000-32000]; disk=1024
I0627 23:10:37.487869 11639 sched.cpp:177] New master at
[email protected]:42260
I0627 23:10:37.495751 11633 slave.cpp:528] New master detected at
[email protected]:42260
I0627 23:10:37.495753 11632 process_isolator.cpp:303] Recovering isolator
I0627 23:10:37.495971 11639 master.cpp:569] Registering framework
201306272310-1032504131-42260-11206-0000 at scheduler(18)@67.195.138.61:42260
I0627 23:10:37.497521 11632 sched.cpp:222] Framework registered with
201306272310-1032504131-42260-11206-0000
I0627 23:10:37.496054 11634 status_update_manager.cpp:155] New master detected
at [email protected]:42260
I0627 23:10:37.497534 11639 hierarchical_allocator_process.hpp:327] Added
framework 201306272310-1032504131-42260-11206-0000
I0627 23:10:37.498798 11639 hierarchical_allocator_process.hpp:705] No
resources available to allocate!
I0627 23:10:37.499268 11639 hierarchical_allocator_process.hpp:667] Performed
allocation for 0 slaves in 472.547us
I0627 23:10:37.496024 11633 slave.cpp:543] Postponing registration until
recovery is complete
I0627 23:10:37.500259 11633 slave.cpp:389] Finished recovery
I0627 23:10:37.501011 11635 master.cpp:891] Attempting to register slave on
juno.apache.org at slave(22)@67.195.138.61:42260
I0627 23:10:37.501247 11635 master.cpp:1851] Adding slave
201306272310-1032504131-42260-11206-0 at juno.apache.org with cpus=2; mem=1024;
ports=[31000-32000]; disk=1024
I0627 23:10:37.501842 11638 slave.cpp:588] Registered with master
[email protected]:42260; given slave ID 201306272310-1032504131-42260-11206-0
I0627 23:10:37.501883 11635 hierarchical_allocator_process.hpp:449] Added slave
201306272310-1032504131-42260-11206-0 (juno.apache.org) with cpus=2; mem=1024;
ports=[31000-32000]; disk=1024 (and cpus=2; mem=1024; ports=[31000-32000];
disk=1024 available)
I0627 23:10:37.502459 11638 paths.hpp:336] Created slave directory
'/tmp/SlaveRecoveryTest_0_CleanupExecutor_BF7VIa/meta/slaves/201306272310-1032504131-42260-11206-0'
Checkpointing SlaveInfo to
'/tmp/SlaveRecoveryTest_0_CleanupExecutor_BF7VIa/meta/slaves/201306272310-1032504131-42260-11206-0/slave.info'
I0627 23:10:37.502784 11635 hierarchical_allocator_process.hpp:727] Offering
cpus=2; mem=1024; ports=[31000-32000]; disk=1024 on slave
201306272310-1032504131-42260-11206-0 to framework
201306272310-1032504131-42260-11206-0000
I0627 23:10:37.503823 11635 hierarchical_allocator_process.hpp:687] Performed
allocation for slave 201306272310-1032504131-42260-11206-0 in 1.055ms
I0627 23:10:37.503890 11638 master.hpp:313] Adding offer
201306272310-1032504131-42260-11206-0 with resources cpus=2; mem=1024;
ports=[31000-32000]; disk=1024 on slave 201306272310-1032504131-42260-11206-0
I0627 23:10:37.505142 11638 master.cpp:1239] Sending 1 offers to framework
201306272310-1032504131-42260-11206-0000
I0627 23:10:37.507694 11633 master.cpp:1472] Processing reply for offer
201306272310-1032504131-42260-11206-0 on slave
201306272310-1032504131-42260-11206-0 (juno.apache.org) for framework
201306272310-1032504131-42260-11206-0000
I0627 23:10:37.507940 11633 master.hpp:291] Adding task
d5edf5d2-94e1-47d6-ae45-03e8844d5f36 with resources cpus=2; mem=1024;
ports=[31000-32000]; disk=1024 on slave 201306272310-1032504131-42260-11206-0
I0627 23:10:37.508301 11633 master.cpp:1591] Launching task
d5edf5d2-94e1-47d6-ae45-03e8844d5f36 of framework
201306272310-1032504131-42260-11206-0000 with resources cpus=2; mem=1024;
ports=[31000-32000]; disk=1024 on slave 201306272310-1032504131-42260-11206-0
(juno.apache.org)
I0627 23:10:37.508805 11636 slave.cpp:738] Got assigned task
d5edf5d2-94e1-47d6-ae45-03e8844d5f36 for framework
201306272310-1032504131-42260-11206-0000
Checkpointing FrameworkInfo to
'/tmp/SlaveRecoveryTest_0_CleanupExecutor_BF7VIa/meta/slaves/201306272310-1032504131-42260-11206-0/frameworks/201306272310-1032504131-42260-11206-0000/framework.info'
Checkpointing 'scheduler(18)@67.195.138.61:42260' to
'/tmp/SlaveRecoveryTest_0_CleanupExecutor_BF7VIa/meta/slaves/201306272310-1032504131-42260-11206-0/frameworks/201306272310-1032504131-42260-11206-0000/framework.pid'
I0627 23:10:37.510175 11636 slave.cpp:836] Launching task
d5edf5d2-94e1-47d6-ae45-03e8844d5f36 for framework
201306272310-1032504131-42260-11206-0000
I0627 23:10:37.511561 11636 paths.hpp:303] Created executor directory
'/tmp/SlaveRecoveryTest_0_CleanupExecutor_BF7VIa/slaves/201306272310-1032504131-42260-11206-0/frameworks/201306272310-1032504131-42260-11206-0000/executors/d5edf5d2-94e1-47d6-ae45-03e8844d5f36/runs/a671f481-eaea-4ffb-bde3-93efa1c7c554'
Checkpointing ExecutorInfo to
'/tmp/SlaveRecoveryTest_0_CleanupExecutor_BF7VIa/meta/slaves/201306272310-1032504131-42260-11206-0/frameworks/201306272310-1032504131-42260-11206-0000/executors/d5edf5d2-94e1-47d6-ae45-03e8844d5f36/executor.info'
I0627 23:10:37.511950 11636 paths.hpp:303] Created executor directory
'/tmp/SlaveRecoveryTest_0_CleanupExecutor_BF7VIa/meta/slaves/201306272310-1032504131-42260-11206-0/frameworks/201306272310-1032504131-42260-11206-0000/executors/d5edf5d2-94e1-47d6-ae45-03e8844d5f36/runs/a671f481-eaea-4ffb-bde3-93efa1c7c554'
Checkpointing Task to
'/tmp/SlaveRecoveryTest_0_CleanupExecutor_BF7VIa/meta/slaves/201306272310-1032504131-42260-11206-0/frameworks/201306272310-1032504131-42260-11206-0000/executors/d5edf5d2-94e1-47d6-ae45-03e8844d5f36/runs/a671f481-eaea-4ffb-bde3-93efa1c7c554/tasks/d5edf5d2-94e1-47d6-ae45-03e8844d5f36/task.info'
I0627 23:10:37.513481 11635 process_isolator.cpp:99] Launching
d5edf5d2-94e1-47d6-ae45-03e8844d5f36
(/home/jenkins/jenkins-slave/workspace/Mesos-Trunk-Ubuntu-Build-In-Src-Set-JAVA_HOME/src/mesos-executor)
in
/tmp/SlaveRecoveryTest_0_CleanupExecutor_BF7VIa/slaves/201306272310-1032504131-42260-11206-0/frameworks/201306272310-1032504131-42260-11206-0000/executors/d5edf5d2-94e1-47d6-ae45-03e8844d5f36/runs/a671f481-eaea-4ffb-bde3-93efa1c7c554
with resources ' for framework 201306272310-1032504131-42260-11206-0000
I0627 23:10:37.508839 11633 master.hpp:323] Removing offer
201306272310-1032504131-42260-11206-0 with resources cpus=2; mem=1024;
ports=[31000-32000]; disk=1024 on slave 201306272310-1032504131-42260-11206-0
I0627 23:10:37.518288 11636 slave.cpp:947] Queuing task
'd5edf5d2-94e1-47d6-ae45-03e8844d5f36' for executor
d5edf5d2-94e1-47d6-ae45-03e8844d5f36 of framework
'201306272310-1032504131-42260-11206-0000
I0627 23:10:37.518731 11636 slave.cpp:510] Successfully attached file
'/tmp/SlaveRecoveryTest_0_CleanupExecutor_BF7VIa/slaves/201306272310-1032504131-42260-11206-0/frameworks/201306272310-1032504131-42260-11206-0000/executors/d5edf5d2-94e1-47d6-ae45-03e8844d5f36/runs/a671f481-eaea-4ffb-bde3-93efa1c7c554'
I0627 23:10:37.519456 11635 process_isolator.cpp:161] Forked executor at 12414
Checkpointing forked pid 12414
Checkpointing '12414' to
'/tmp/SlaveRecoveryTest_0_CleanupExecutor_BF7VIa/meta/slaves/201306272310-1032504131-42260-11206-0/frameworks/201306272310-1032504131-42260-11206-0000/executors/d5edf5d2-94e1-47d6-ae45-03e8844d5f36/runs/a671f481-eaea-4ffb-bde3-93efa1c7c554/pids/forked.pid'
Fetching resources into
'/tmp/SlaveRecoveryTest_0_CleanupExecutor_BF7VIa/slaves/201306272310-1032504131-42260-11206-0/frameworks/201306272310-1032504131-42260-11206-0000/executors/d5edf5d2-94e1-47d6-ae45-03e8844d5f36/runs/a671f481-eaea-4ffb-bde3-93efa1c7c554'
I0627 23:10:37.561638 11640 process.cpp:878] Socket closed while receiving
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0627 23:10:37.600602 12434 process.cpp:1426] libprocess is initialized on
67.195.138.61:43332 for 8 cpus
I0627 23:10:37.610895 12450 exec.cpp:170] Executor started at:
executor(1)@67.195.138.61:43332 with pid 12434
I0627 23:10:37.611531 11637 slave.cpp:1394] Got registration for executor
'd5edf5d2-94e1-47d6-ae45-03e8844d5f36' of framework
201306272310-1032504131-42260-11206-0000
Checkpointing 'executor(1)@67.195.138.61:43332' to
'/tmp/SlaveRecoveryTest_0_CleanupExecutor_BF7VIa/meta/slaves/201306272310-1032504131-42260-11206-0/frameworks/201306272310-1032504131-42260-11206-0000/executors/d5edf5d2-94e1-47d6-ae45-03e8844d5f36/runs/a671f481-eaea-4ffb-bde3-93efa1c7c554/pids/libprocess.pid'
I0627 23:10:37.612431 12450 exec.cpp:194] Executor registered on slave
201306272310-1032504131-42260-11206-0
Registered executor on juno.apache.org
I0627 23:10:37.612457 12456 process.cpp:878] Socket closed while receiving
I0627 23:10:37.612041 11637 slave.cpp:1509] Flushing queued task
d5edf5d2-94e1-47d6-ae45-03e8844d5f36 for executor
'd5edf5d2-94e1-47d6-ae45-03e8844d5f36' of framework
201306272310-1032504131-42260-11206-0000
I0627 23:10:37.613847 12456 process.cpp:878] Socket closed while receiving
I0627 23:10:37.613988 12452 exec.cpp:258] Executor asked to run task
'd5edf5d2-94e1-47d6-ae45-03e8844d5f36'
Starting task d5edf5d2-94e1-47d6-ae45-03e8844d5f36
Forked command at 12457
sh -c 'sleep 1000'
I0627 23:10:37.615681 12452 exec.cpp:404] Executor sending status update
TASK_RUNNING (UUID: 1c5fd31b-7a14-4573-a116-ff8baac5c907) for task
d5edf5d2-94e1-47d6-ae45-03e8844d5f36 of framework
201306272310-1032504131-42260-11206-0000
I0627 23:10:37.615947 11633 slave.cpp:1691] Handling status update TASK_RUNNING
(UUID: 1c5fd31b-7a14-4573-a116-ff8baac5c907) for task
d5edf5d2-94e1-47d6-ae45-03e8844d5f36 of framework
201306272310-1032504131-42260-11206-0000 from executor(1)@67.195.138.61:43332
I0627 23:10:37.616220 11633 status_update_manager.cpp:290] Received status
update TASK_RUNNING (UUID: 1c5fd31b-7a14-4573-a116-ff8baac5c907) for task
d5edf5d2-94e1-47d6-ae45-03e8844d5f36 of framework
201306272310-1032504131-42260-11206-0000 with checkpoint=true
I0627 23:10:37.616529 11633 status_update_manager.cpp:450] Creating
StatusUpdate stream for task d5edf5d2-94e1-47d6-ae45-03e8844d5f36 of framework
201306272310-1032504131-42260-11206-0000
I0627 23:10:37.617097 11633 status_update_manager.hpp:324] Checkpointing UPDATE
for status update TASK_RUNNING (UUID: 1c5fd31b-7a14-4573-a116-ff8baac5c907) for
task d5edf5d2-94e1-47d6-ae45-03e8844d5f36 of framework
201306272310-1032504131-42260-11206-0000
I0627 23:10:37.706604 11633 status_update_manager.cpp:336] Forwarding status
update TASK_RUNNING (UUID: 1c5fd31b-7a14-4573-a116-ff8baac5c907) for task
d5edf5d2-94e1-47d6-ae45-03e8844d5f36 of framework
201306272310-1032504131-42260-11206-0000 to [email protected]:42260
I0627 23:10:37.707031 11636 slave.cpp:1802] Status update manager successfully
handled status update TASK_RUNNING (UUID: 1c5fd31b-7a14-4573-a116-ff8baac5c907)
for task d5edf5d2-94e1-47d6-ae45-03e8844d5f36 of framework
201306272310-1032504131-42260-11206-0000
I0627 23:10:37.707044 11633 master.cpp:1022] Status update from
slave(22)@67.195.138.61:42260: task d5edf5d2-94e1-47d6-ae45-03e8844d5f36 of
framework 201306272310-1032504131-42260-11206-0000 is now in state TASK_RUNNING
I0627 23:10:37.707169 11636 slave.cpp:1808] Sending acknowledgement for status
update TASK_RUNNING (UUID: 1c5fd31b-7a14-4573-a116-ff8baac5c907) for task
d5edf5d2-94e1-47d6-ae45-03e8844d5f36 of framework
201306272310-1032504131-42260-11206-0000 to executor(1)@67.195.138.61:43332
I0627 23:10:37.708674 12456 process.cpp:878] Socket closed while receiving
I0627 23:10:37.708730 12448 exec.cpp:290] Executor received status update
acknowledgement 1c5fd31b-7a14-4573-a116-ff8baac5c907 for task
d5edf5d2-94e1-47d6-ae45-03e8844d5f36 of framework
201306272310-1032504131-42260-11206-0000
I0627 23:10:37.708386 11636 status_update_manager.cpp:360] Received status
update acknowledgement 1c5fd31b-7a14-4573-a116-ff8baac5c907 for task
d5edf5d2-94e1-47d6-ae45-03e8844d5f36 of framework
201306272310-1032504131-42260-11206-0000
I0627 23:10:37.710053 11636 status_update_manager.hpp:324] Checkpointing ACK
for status update TASK_RUNNING (UUID: 1c5fd31b-7a14-4573-a116-ff8baac5c907) for
task d5edf5d2-94e1-47d6-ae45-03e8844d5f36 of framework
201306272310-1032504131-42260-11206-0000
I0627 23:10:37.904566 11636 slave.cpp:1342] Status update manager successfully
handled status update acknowledgement 1c5fd31b-7a14-4573-a116-ff8baac5c907 for
task d5edf5d2-94e1-47d6-ae45-03e8844d5f36 of framework
201306272310-1032504131-42260-11206-0000
I0627 23:10:37.905002 11206 slave.cpp:439] Slave terminating
I0627 23:10:37.905333 11632 master.cpp:460] Slave
201306272310-1032504131-42260-11206-0 (juno.apache.org) disconnected
I0627 23:10:37.906338 11636 slave.cpp:112] Slave started on
23)@67.195.138.61:42260
I0627 23:10:37.906491 11637 hierarchical_allocator_process.hpp:667] Performed
allocation for 1 slaves in 41.695us
I0627 23:10:37.906744 11636 slave.cpp:204] Slave resources: cpus=2; mem=1024;
ports=[31000-32000]; disk=1024
I0627 23:10:37.907874 11636 state.cpp:33] Recovering state from
/tmp/SlaveRecoveryTest_0_CleanupExecutor_BF7VIa/meta
I0627 23:10:37.909067 11632 status_update_manager.cpp:177] Recovering status
update manager
I0627 23:10:37.909111 11632 status_update_manager.cpp:181] Recovering executor
'd5edf5d2-94e1-47d6-ae45-03e8844d5f36' of framework
201306272310-1032504131-42260-11206-0000
I0627 23:10:37.909427 11636 slave.cpp:528] New master detected at
[email protected]:42260
I0627 23:10:37.909481 11632 status_update_manager.cpp:450] Creating
StatusUpdate stream for task d5edf5d2-94e1-47d6-ae45-03e8844d5f36 of framework
201306272310-1032504131-42260-11206-0000
I0627 23:10:37.910508 11632 status_update_manager.hpp:288] Replaying status
update stream for task d5edf5d2-94e1-47d6-ae45-03e8844d5f36
I0627 23:10:37.911044 11635 process_isolator.cpp:303] Recovering isolator
I0627 23:10:37.911408 11635 process_isolator.cpp:311] Recovering executor
'd5edf5d2-94e1-47d6-ae45-03e8844d5f36' of framework
201306272310-1032504131-42260-11206-0000
I0627 23:10:37.909950 11636 slave.cpp:537] Skipping registration because slave
is in 'cleanup' mode
I0627 23:10:37.912358 11636 slave.cpp:2819] Recovering executor
'd5edf5d2-94e1-47d6-ae45-03e8844d5f36' of framework
201306272310-1032504131-42260-11206-0000
I0627 23:10:37.912951 11636 slave.cpp:2662] Sending shutdown to executor
'd5edf5d2-94e1-47d6-ae45-03e8844d5f36' of framework
201306272310-1032504131-42260-11206-0000 to executor(1)@67.195.138.61:43332
I0627 23:10:37.913314 11636 slave.cpp:2327] Shutting down executor
'd5edf5d2-94e1-47d6-ae45-03e8844d5f36' of framework
201306272310-1032504131-42260-11206-0000
I0627 23:10:37.914327 12456 process.cpp:878] Socket closed while receiving
I0627 23:10:37.914351 12455 exec.cpp:323] Executor asked to shutdown
Shutting down
I0627 23:10:37.914877 12454 exec.cpp:75] Scheduling shutdown of the executor
I0627 23:10:37.911058 11632 status_update_manager.cpp:155] New master detected
at [email protected]:42260
I0627 23:10:37.915552 11636 slave.cpp:510] Successfully attached file
'/tmp/SlaveRecoveryTest_0_CleanupExecutor_BF7VIa/slaves/201306272310-1032504131-42260-11206-0/frameworks/201306272310-1032504131-42260-11206-0000/executors/d5edf5d2-94e1-47d6-ae45-03e8844d5f36/runs/a671f481-eaea-4ffb-bde3-93efa1c7c554'
I0627 23:10:37.915717 11636 slave.cpp:389] Finished recovery
W0627 23:10:37.916795 11636 master.cpp:944] Slave at
slave(23)@67.195.138.61:42260 (juno.apache.org) is being allowed to re-register
with an already in use id (201306272310-1032504131-42260-11206-0)
W0627 23:10:37.916992 11636 master.cpp:944] Slave at
slave(23)@67.195.138.61:42260 (juno.apache.org) is being allowed to re-register
with an already in use id (201306272310-1032504131-42260-11206-0)
I0627 23:10:37.917031 11634 slave.cpp:629] Re-registered with master
[email protected]:42260
I0627 23:10:37.918179 11634 slave.cpp:1276] Updating framework
201306272310-1032504131-42260-11206-0000 pid to
scheduler(18)@67.195.138.61:42260
Checkpointing 'scheduler(18)@67.195.138.61:42260' to
'/tmp/SlaveRecoveryTest_0_CleanupExecutor_BF7VIa/meta/slaves/201306272310-1032504131-42260-11206-0/frameworks/201306272310-1032504131-42260-11206-0000/framework.pid'
W0627 23:10:37.918856 11634 slave.cpp:643] Already re-registered with master
[email protected]:42260
I0627 23:10:37.919612 11634 slave.cpp:1276] Updating framework
201306272310-1032504131-42260-11206-0000 pid to
scheduler(18)@67.195.138.61:42260
Checkpointing 'scheduler(18)@67.195.138.61:42260' to
'/tmp/SlaveRecoveryTest_0_CleanupExecutor_BF7VIa/meta/slaves/201306272310-1032504131-42260-11206-0/frameworks/201306272310-1032504131-42260-11206-0000/framework.pid'
I0627 23:10:37.926779 11635 hierarchical_allocator_process.hpp:667] Performed
allocation for 1 slaves in 53.179us
I0627 23:10:37.936836 11639 hierarchical_allocator_process.hpp:667] Performed
allocation for 1 slaves in 39.736us
I0627 23:10:37.947145 11638 hierarchical_allocator_process.hpp:667] Performed
allocation for 1 slaves in 34.052us
W0627 23:10:37.956991 11637 master.cpp:83] No whitelist given. Advertising
offers for all slaves
I0627 23:10:37.957289 11633 slave.cpp:2396] Killing executor
'd5edf5d2-94e1-47d6-ae45-03e8844d5f36' of framework
201306272310-1032504131-42260-11206-0000
I0627 23:10:37.957679 11632 hierarchical_allocator_process.hpp:667] Performed
allocation for 1 slaves in 50.929us
Waited on process 12457, returned status 9
Performing killtree operation on 12457
Stopped 12457
Children of 12457: { 12459 }
Members of group 12457: { 12457, 12459 }
Members of session 12457: { 12457, 12459 }
Stopped 12459
Children of 12459: { }
Members of group 12457: { 12457, 12459 }
Members of session 12457: { 12457, 12459 }
Signaled 12457
Signaled 12459
Result::get() but state == NONE
I0627 23:10:37.968878 12455 exec.cpp:404] Executor sending status update
TASK_FAILED (UUID: 45764b93-5acd-4ba8-923e-e39fd25a8f8c) for task
d5edf5d2-94e1-47d6-ae45-03e8844d5f36 of framework
201306272310-1032504131-42260-11206-0000
I0627 23:10:37.976610 12456 process.cpp:876] Socket error while receiving:
Connection reset by peer
make[3]: *** [check-local] Aborted
I0627 23:10:37.976788 12454 exec.cpp:348] Ignoring exited event because the
driver is aborted!
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira