[
https://issues.apache.org/jira/browse/MESOS-514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692207#comment-13692207
]
Vinod Kone commented on MESOS-514:
----------------------------------
https://reviews.apache.org/r/12064/
> FaultToleranceTest.ReconcileIncompleteTasks is flaky
> ----------------------------------------------------
>
> Key: MESOS-514
> URL: https://issues.apache.org/jira/browse/MESOS-514
> Project: Mesos
> Issue Type: Bug
> Reporter: Thomas Marshall
> Assignee: Vinod Kone
>
> https://hadrian.millennium.berkeley.edu/jenkins/job/Mesos-minimal/366/consoleFull
>
> https://hadrian.millennium.berkeley.edu/jenkins/job/Mesos/302/console
> [ RUN ] FaultToleranceTest.ReconcileIncompleteTasks
> I0617 03:47:37.707007 10205 master.cpp:228] Master started on 127.0.1.1:32998
> I0617 03:47:37.707059 10205 master.cpp:243] Master ID:
> 201306170347-16842879-32998-10185
> I0617 03:47:37.707181 10206 slave.cpp:219] Slave started on
> 78)@127.0.1.1:32998
> I0617 03:47:37.707248 10206 slave.cpp:220] Slave resources: cpus=2; mem=1024;
> ports=[31000-32000]; disk=1024
> W0617 03:47:37.707418 10204 master.cpp:83] No whitelist given. Advertising
> offers for all slaves
> I0617 03:47:37.707494 10205 master.cpp:526] Elected as master!
> I0617 03:47:37.707707 10205 master.cpp:569] Registering framework
> 201306170347-16842879-32998-10185-0000 at scheduler(68)@127.0.1.1:32998
> I0617 03:47:37.707897 10206 slave.cpp:540] New master detected at
> [email protected]:32998
> I0617 03:47:37.707911 10205 hierarchical_allocator_process.hpp:327] Added
> framework 201306170347-16842879-32998-10185-0000
> I0617 03:47:37.708050 10206 slave.cpp:555] Postponing registration until
> recovery is complete
> I0617 03:47:37.708091 10207 status_update_manager.cpp:155] New master
> detected at [email protected]:32998
> I0617 03:47:37.708112 10206 slave.cpp:401] Finished recovery
> I0617 03:47:37.708359 10207 master.cpp:891] Attempting to register slave on
> ubuntu at slave(78)@127.0.1.1:32998
> I0617 03:47:37.708413 10207 master.cpp:1851] Adding slave
> 201306170347-16842879-32998-10185-0 at ubuntu with cpus=2; mem=1024;
> ports=[31000-32000]; disk=1024
> I0617 03:47:37.708566 10205 slave.cpp:600] Registered with master
> [email protected]:32998; given slave ID 201306170347-16842879-32998-10185-0
> I0617 03:47:37.708719 10205 hierarchical_allocator_process.hpp:449] Added
> slave 201306170347-16842879-32998-10185-0 (ubuntu) with cpus=2; mem=1024;
> ports=[31000-32000]; disk=1024 (and cpus=2; mem=1024; ports=[31000-32000];
> disk=1024 available)
> I0617 03:47:37.709128 10207 master.cpp:1239] Sending 1 offers to framework
> 201306170347-16842879-32998-10185-0000
> I0617 03:47:37.710453 10207 master.cpp:1472] Processing reply for offer
> 201306170347-16842879-32998-10185-0 on slave
> 201306170347-16842879-32998-10185-0 (ubuntu) for framework
> 201306170347-16842879-32998-10185-0000
> I0617 03:47:37.710728 10207 master.hpp:291] Adding task 1 with resources
> cpus=2; mem=1024; ports=[31000-32000]; disk=1024 on slave
> 201306170347-16842879-32998-10185-0
> I0617 03:47:37.710814 10207 master.cpp:1591] Launching task 1 of framework
> 201306170347-16842879-32998-10185-0000 with resources cpus=2; mem=1024;
> ports=[31000-32000]; disk=1024 on slave 201306170347-16842879-32998-10185-0
> (ubuntu)
> I0617 03:47:37.711158 10207 slave.cpp:740] Got assigned task 1 for framework
> 201306170347-16842879-32998-10185-0000
> I0617 03:47:37.711511 10207 slave.cpp:838] Launching task 1 for framework
> 201306170347-16842879-32998-10185-0000
> I0617 03:47:37.713037 10207 paths.hpp:303] Created executor directory
> '/tmp/FaultToleranceTest_ReconcileIncompleteTasks_fTS0CG/slaves/201306170347-16842879-32998-10185-0/frameworks/201306170347-16842879-32998-10185-0000/executors/default/runs/9db6775d-9610-4eeb-ac97-50349467607a'
> I0617 03:47:37.713376 10207 slave.cpp:949] Queuing task '1' for executor
> default of framework '201306170347-16842879-32998-10185-0000
> I0617 03:47:37.713525 10207 slave.cpp:522] Successfully attached file
> '/tmp/FaultToleranceTest_ReconcileIncompleteTasks_fTS0CG/slaves/201306170347-16842879-32998-10185-0/frameworks/201306170347-16842879-32998-10185-0000/executors/default/runs/9db6775d-9610-4eeb-ac97-50349467607a'
> I0617 03:47:37.714153 10206 slave.cpp:1396] Got registration for executor
> 'default' of framework 201306170347-16842879-32998-10185-0000
> I0617 03:47:37.714453 10206 slave.cpp:1511] Flushing queued task 1 for
> executor 'default' of framework 201306170347-16842879-32998-10185-0000
> I0617 03:47:37.716209 10206 slave.cpp:1693] Handling status update
> TASK_FINISHED (UUID: ff73e62c-f0b4-47e3-b196-2ed3b6c6ec18) for task 1 of
> framework 201306170347-16842879-32998-10185-0000
> I0617 03:47:37.716699 10206 status_update_manager.cpp:290] Received status
> update TASK_FINISHED (UUID: ff73e62c-f0b4-47e3-b196-2ed3b6c6ec18) for task 1
> of framework 201306170347-16842879-32998-10185-0000 with checkpoint=false
> I0617 03:47:37.716789 10206 status_update_manager.cpp:450] Creating
> StatusUpdate stream for task 1 of framework
> 201306170347-16842879-32998-10185-0000
> I0617 03:47:37.716927 10206 status_update_manager.cpp:336] Forwarding status
> update TASK_FINISHED (UUID: ff73e62c-f0b4-47e3-b196-2ed3b6c6ec18) for task 1
> of framework 201306170347-16842879-32998-10185-0000 to [email protected]:32998
> I0617 03:47:37.717314 10206 slave.cpp:1810] Sending acknowledgement for
> status update TASK_FINISHED (UUID: ff73e62c-f0b4-47e3-b196-2ed3b6c6ec18) for
> task 1 of framework 201306170347-16842879-32998-10185-0000 to
> executor(29)@127.0.1.1:32998
> I0617 03:47:37.717918 10205 slave.cpp:540] New master detected at
> [email protected]:32998
> I0617 03:47:37.718029 10207 status_update_manager.cpp:155] New master
> detected at [email protected]:32998
> W0617 03:47:37.718075 10207 status_update_manager.cpp:165] Resending status
> update TASK_FINISHED (UUID: ff73e62c-f0b4-47e3-b196-2ed3b6c6ec18) for task 1
> of framework 201306170347-16842879-32998-10185-0000
> I0617 03:47:37.718139 10207 status_update_manager.cpp:336] Forwarding status
> update TASK_FINISHED (UUID: ff73e62c-f0b4-47e3-b196-2ed3b6c6ec18) for task 1
> of framework 201306170347-16842879-32998-10185-0000 to [email protected]:32998
> W0617 03:47:37.718214 10206 master.cpp:944] Slave at
> slave(78)@127.0.1.1:32998 (ubuntu) is being allowed to re-register with an
> already in use id (201306170347-16842879-32998-10185-0)
> I0617 03:47:37.718479 10205 slave.cpp:636] Re-registered with master
> [email protected]:32998
> I0617 03:47:37.718602 10205 slave.cpp:1278] Updating framework
> 201306170347-16842879-32998-10185-0000 pid to scheduler(68)@127.0.1.1:32998
> I0617 03:47:37.718904 10206 master.cpp:1022] Status update from
> slave(78)@127.0.1.1:32998: task 1 of framework
> 201306170347-16842879-32998-10185-0000 is now in state TASK_FINISHED
> W0617 03:47:37.719336 10204 master.cpp:83] No whitelist given. Advertising
> offers for all slaves
> I0617 03:47:37.719465 10206 master.hpp:303] Removing task 1 with resources
> cpus=2; mem=1024; ports=[31000-32000]; disk=1024 on slave
> 201306170347-16842879-32998-10185-0
> W0617 03:47:37.719513 10204 status_update_manager.cpp:433] Resending status
> update TASK_FINISHED (UUID: ff73e62c-f0b4-47e3-b196-2ed3b6c6ec18) for task 1
> of framework 201306170347-16842879-32998-10185-0000
> I0617 03:47:37.719606 10204 status_update_manager.cpp:336] Forwarding status
> update TASK_FINISHED (UUID: ff73e62c-f0b4-47e3-b196-2ed3b6c6ec18) for task 1
> of framework 201306170347-16842879-32998-10185-0000 to [email protected]:32998
> I0617 03:47:37.719857 10206 hierarchical_allocator_process.hpp:616] Recovered
> cpus=2; mem=1024; ports=[31000-32000]; disk=1024 (total allocatable: cpus=2;
> mem=1024; ports=[31000-32000]; disk=1024) on slave
> 201306170347-16842879-32998-10185-0 from framework
> 201306170347-16842879-32998-10185-0000
> I0617 03:47:37.719954 10206 master.cpp:1022] Status update from
> slave(78)@127.0.1.1:32998: task 1 of framework
> 201306170347-16842879-32998-10185-0000 is now in state TASK_FINISHED
> W0617 03:47:37.720090 10206 master.cpp:1065] Status update from
> slave(78)@127.0.1.1:32998 (ubuntu): error, couldn't lookup task 1
> ../../src/tests/fault_tolerance_tests.cpp:1504: Failure
> Mock function called more times than expected - returning directly.
> Function call: statusUpdate(0x7fff842068e0, @0x2b759c001a30 56-byte
> object <90-CF 70-89 75-2B 00-00 00-00 00-00 00-00 00-00 E0-12 01-9C 75-2B
> 00-00 08-B5 F1-00 00-00 00-00 08-B5 F1-00 00-00 00-00 02-00 00-00 00-00 00-00
> 03-00 00-00 00-00 00-00>)
> Expected: to be called once
> Actual: called twice - over-saturated and active
> I0617 03:47:37.720504 10204 master.cpp:385] Master terminating
> I0617 03:47:37.720610 10185 master.cpp:207] Shutting down master
> I0617 03:47:37.720737 10206 slave.cpp:496] Slave asked to shut down by
> [email protected]:32998
> I0617 03:47:37.721110 10206 slave.cpp:1113] Asked to shut down framework
> 201306170347-16842879-32998-10185-0000 by [email protected]:32998
> I0617 03:47:37.721174 10206 slave.cpp:1138] Shutting down framework
> 201306170347-16842879-32998-10185-0000
> I0617 03:47:37.721268 10206 slave.cpp:2320] Shutting down executor 'default'
> of framework 201306170347-16842879-32998-10185-0000
> I0617 03:47:37.720873 10207 status_update_manager.cpp:360] Received status
> update acknowledgement ff73e62c-f0b4-47e3-b196-2ed3b6c6ec18 for task 1 of
> framework 201306170347-16842879-32998-10185-0000
> I0617 03:47:37.721436 10207 status_update_manager.cpp:481] Cleaning up status
> update stream for task 1 of framework 201306170347-16842879-32998-10185-0000
> I0617 03:47:37.722100 10204 hierarchical_allocator_process.hpp:412]
> Deactivated framework 201306170347-16842879-32998-10185-0000
> I0617 03:47:37.722244 10204 hierarchical_allocator_process.hpp:367] Removed
> framework 201306170347-16842879-32998-10185-0000
> I0617 03:47:37.722342 10204 hierarchical_allocator_process.hpp:477] Removed
> slave 201306170347-16842879-32998-10185-0
> I0617 03:47:37.722759 10206 slave.cpp:451] Slave terminating
> I0617 03:47:37.722797 10206 slave.cpp:1113] Asked to shut down framework
> 201306170347-16842879-32998-10185-0000 by @0.0.0.0:0
> W0617 03:47:37.722825 10206 slave.cpp:1134] Ignoring shutdown framework
> 201306170347-16842879-32998-10185-0000 because it is terminating
> [ FAILED ] FaultToleranceTest.ReconcileIncompleteTasks (18 ms)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira