[
https://issues.apache.org/jira/browse/MESOS-410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13609628#comment-13609628
]
Thomas Marshall commented on MESOS-410:
---------------------------------------
I can't actually reproduce this (although it clearly is a real issue since two
of the Jenkins builds both hit this problem) nor can I find any other issues
with the allocator tests after running them all ~1000 on several different
configurations. However, that doesn't convince me that there won't be more
problems in the future, and now that I have a better idea of the kinds of
things that make these tests fragile, I'll be posting a review in the next week
or so with some changes to make all of the allocator tests more robust so we
don't have to deal with these things one by one.
> AllocatorTest/0.SchedulerFailover is flaky.
> -------------------------------------------
>
> Key: MESOS-410
> URL: https://issues.apache.org/jira/browse/MESOS-410
> Project: Mesos
> Issue Type: Bug
> Reporter: Benjamin Mahler
> Assignee: Thomas Marshall
>
> Looks like there are some bad expectations in the SchedulerFailover test.
> [ RUN ] AllocatorTest/0.SchedulerFailover
> I0321 20:49:55.008021 3226 master.cpp:309] Master started on
> 67.195.138.60:33834
> I0321 20:49:55.008059 3226 master.cpp:324] Master ID:
> 201303212049-1015726915-33834-3147
> I0321 20:49:55.008726 3226 master.cpp:603] Elected as master!
> I0321 20:49:55.008412 3228 sched.cpp:182] New master at
> [email protected]:33834
> I0321 20:49:55.009618 3228 master.cpp:646] Registering framework
> 201303212049-1015726915-33834-3147-0000 at scheduler(63)@67.195.138.60:33834
> I0321 20:49:55.010125 3228 sched.cpp:217] Framework registered with
> 201303212049-1015726915-33834-3147-0000
> W0321 20:49:55.008816 3229 master.cpp:81] No whitelist given. Advertising
> offers for all slaves
> I0321 20:49:55.008859 3230 hierarchical_allocator_process.hpp:236]
> Initializing hierarchical allocator process with master :
> [email protected]:33834
> I0321 20:49:55.060097 3230 hierarchical_allocator_process.hpp:268] Added
> framework 201303212049-1015726915-33834-3147-0000
> I0321 20:49:55.065731 3230 hierarchical_allocator_process.hpp:668] No
> resources available to allocate!
> I0321 20:49:55.066246 3230 hierarchical_allocator_process.hpp:599] Performed
> allocation for 0 slaves in 517.48us
> I0321 20:49:55.008122 3224 slave.cpp:203] Slave started on
> 73)@67.195.138.60:33834
> I0321 20:49:55.067206 3224 slave.cpp:204] Slave resources: cpus=3; mem=1024;
> ports=[31000-32000]; disk=14036
> I0321 20:49:55.068222 3224 slave.cpp:453] New master detected at
> [email protected]:33834
> I0321 20:49:55.068403 3224 slave.cpp:377] Finished recovery
> I0321 20:49:55.069236 3224 master.cpp:968] Attempting to register slave on
> janus.apache.org at slave(73)@67.195.138.60:33834
> I0321 20:49:55.070163 3224 master.cpp:1224] Master now considering a slave
> at janus.apache.org:33834 as active
> I0321 20:49:55.070575 3224 master.cpp:1862] Adding slave
> 201303212049-1015726915-33834-3147-0 at janus.apache.org with cpus=3;
> mem=1024; ports=[31000-32000]; disk=14036
> I0321 20:49:55.072634 3224 slave.cpp:487] Registered with master; given
> slave ID 201303212049-1015726915-33834-3147-0
> I0321 20:49:55.068428 3226 status_update_manager.cpp:132] New master
> detected at [email protected]:33834
> I0321 20:49:55.072762 3228 hierarchical_allocator_process.hpp:395] Added
> slave 201303212049-1015726915-33834-3147-0 (janus.apache.org) with cpus=3;
> mem=1024; ports=[31000-32000]; disk=14036 (and cpus=3; mem=1024;
> ports=[31000-32000]; disk=14036 available)
> I0321 20:49:55.074096 3228 hierarchical_allocator_process.hpp:660] Found
> available resources: cpus=3; mem=1024; ports=[31000-32000]; disk=14036 on
> slave 201303212049-1015726915-33834-3147-0
> I0321 20:49:55.074628 3228 hierarchical_allocator_process.hpp:686] Offering
> cpus=3; mem=1024; ports=[31000-32000]; disk=14036 on slave
> 201303212049-1015726915-33834-3147-0 to framework
> 201303212049-1015726915-33834-3147-0000
> I0321 20:49:55.075227 3228 hierarchical_allocator_process.hpp:619] Performed
> allocation for slave 201303212049-1015726915-33834-3147-0 in 1.14ms
> I0321 20:49:55.075314 3224 master.hpp:309] Adding offer with resources
> cpus=3; mem=1024; ports=[31000-32000]; disk=14036 on slave
> 201303212049-1015726915-33834-3147-0
> I0321 20:49:55.076213 3224 master.cpp:1327] Sending 1 offers to framework
> 201303212049-1015726915-33834-3147-0000
> I0321 20:49:55.076750 3224 sched.cpp:282] Received 1 offers
> I0321 20:49:55.077636 3224 master.cpp:1534] Processing reply for offer
> 201303212049-1015726915-33834-3147-0 on slave
> 201303212049-1015726915-33834-3147-0 (janus.apache.org) for framework
> 201303212049-1015726915-33834-3147-0000
> I0321 20:49:55.077834 3224 master.hpp:289] Adding task with resources
> cpus=1; mem=256 on slave 201303212049-1015726915-33834-3147-0
> I0321 20:49:55.078308 3224 master.cpp:1651] Launching task 0 of framework
> 201303212049-1015726915-33834-3147-0000 with resources cpus=1; mem=256 on
> slave 201303212049-1015726915-33834-3147-0 (janus.apache.org)
> I0321 20:49:55.078922 3224 master.hpp:318] Removing offer with resources
> cpus=3; mem=1024; ports=[31000-32000]; disk=14036 on slave
> 201303212049-1015726915-33834-3147-0
> I0321 20:49:55.078999 3229 slave.cpp:599] Got assigned task 0 for framework
> 201303212049-1015726915-33834-3147-0000
> I0321 20:49:55.079066 3230 hierarchical_allocator_process.hpp:471] Framework
> 201303212049-1015726915-33834-3147-0000 left cpus=2; mem=768;
> ports=[31000-32000]; disk=14036 unused on slave
> 201303212049-1015726915-33834-3147-0
> I0321 20:49:55.081300 3229 paths.hpp:302] Created executor directory
> '/tmp/AllocatorTest_0_SchedulerFailover_oWPFSh/slaves/201303212049-1015726915-33834-3147-0/frameworks/201303212049-1015726915-33834-3147-0000/executors/default/runs/da91e171-2172-4e06-b425-1aa1e8522aa7'
> I0321 20:49:55.122453 3229 slave.cpp:436] Successfully attached file
> '/tmp/AllocatorTest_0_SchedulerFailover_oWPFSh/slaves/201303212049-1015726915-33834-3147-0/frameworks/201303212049-1015726915-33834-3147-0000/executors/default/runs/da91e171-2172-4e06-b425-1aa1e8522aa7'
> I0321 20:49:55.122535 3225 exec.cpp:170] Executor started at:
> executor(20)@67.195.138.60:33834 with pid 3147
> I0321 20:49:55.123455 3225 slave.cpp:1058] Got registration for executor
> 'default' of framework 201303212049-1015726915-33834-3147-0000
> I0321 20:49:55.124008 3225 slave.cpp:1133] Flushing queued tasks for
> framework 201303212049-1015726915-33834-3147-0000
> I0321 20:49:55.124099 3229 exec.cpp:194] Executor registered on slave
> 201303212049-1015726915-33834-3147-0
> I0321 20:49:55.125076 3229 exec.cpp:258] Executor asked to run task '0'
> I0321 20:49:55.125663 3227 sched.cpp:422] Stopping framework
> '201303212049-1015726915-33834-3147-0000'
> I0321 20:49:55.126713 3227 master.cpp:488] Framework
> 201303212049-1015726915-33834-3147-0000 disconnected
> I0321 20:49:55.127213 3227 master.cpp:500] Giving framework
> 201303212049-1015726915-33834-3147-0000 500.00ms to failover
> I0321 20:49:55.127316 3223 hierarchical_allocator_process.hpp:359]
> Deactivated framework 201303212049-1015726915-33834-3147-0000
> I0321 20:49:55.127428 3226 sched.cpp:182] New master at
> [email protected]:33834
> I0321 20:49:55.128818 3226 master.cpp:681] Re-registering framework
> 201303212049-1015726915-33834-3147-0000 at scheduler(64)@67.195.138.60:33834
> I0321 20:49:55.162433 3226 master.cpp:700] Framework
> 201303212049-1015726915-33834-3147-0000 failed over
> I0321 20:49:55.163038 3226 sched.cpp:217] Framework registered with
> 201303212049-1015726915-33834-3147-0000
> I0321 20:49:55.163110 3228 slave.cpp:968] Updating framework
> 201303212049-1015726915-33834-3147-0000 pid to
> scheduler(64)@67.195.138.60:33834
> I0321 20:49:55.163147 3224 hierarchical_allocator_process.hpp:327] Activated
> framework 201303212049-1015726915-33834-3147-0000
> I0321 20:49:55.164475 3224 hierarchical_allocator_process.hpp:660] Found
> available resources: cpus=2; mem=768; ports=[31000-32000]; disk=14036 on
> slave 201303212049-1015726915-33834-3147-0
> I0321 20:49:55.164984 3224 hierarchical_allocator_process.hpp:686] Offering
> cpus=2; mem=768; ports=[31000-32000]; disk=14036 on slave
> 201303212049-1015726915-33834-3147-0 to framework
> 201303212049-1015726915-33834-3147-0000
> I0321 20:49:55.165602 3224 hierarchical_allocator_process.hpp:599] Performed
> allocation for 1 slaves in 1.15ms
> I0321 20:49:55.165699 3230 master.hpp:309] Adding offer with resources
> cpus=2; mem=768; ports=[31000-32000]; disk=14036 on slave
> 201303212049-1015726915-33834-3147-0
> I0321 20:49:55.166571 3230 master.cpp:1327] Sending 1 offers to framework
> 201303212049-1015726915-33834-3147-0000
> I0321 20:49:55.167109 3230 sched.cpp:282] Received 1 offers
> I0321 20:49:55.167677 3225 sched.cpp:422] Stopping framework
> '201303212049-1015726915-33834-3147-0000'
> I0321 20:49:55.168128 3225 master.cpp:488] Framework
> 201303212049-1015726915-33834-3147-0000 disconnected
> I0321 20:49:55.168639 3225 master.cpp:500] Giving framework
> 201303212049-1015726915-33834-3147-0000 500.00ms to failover
> I0321 20:49:55.178027 3225 master.hpp:318] Removing offer with resources
> cpus=2; mem=768; ports=[31000-32000]; disk=14036 on slave
> 201303212049-1015726915-33834-3147-0
> I0321 20:49:55.168725 3229 hierarchical_allocator_process.hpp:359]
> Deactivated framework 201303212049-1015726915-33834-3147-0000
> I0321 20:49:55.199956 3229 hierarchical_allocator_process.hpp:544] Recovered
> cpus=2; mem=768; ports=[31000-32000]; disk=14036 (total allocatable: cpus=2;
> mem=768; ports=[31000-32000]; disk=14036) on slave
> 201303212049-1015726915-33834-3147-0 from framework
> 201303212049-1015726915-33834-3147-0000
> I0321 20:49:55.678827 3227 master.cpp:1259] Framework failover timeout,
> removing framework 201303212049-1015726915-33834-3147-0000
> I0321 20:49:55.699990 3227 master.hpp:300] Removing task with resources
> cpus=1; mem=256 on slave 201303212049-1015726915-33834-3147-0
> I0321 20:49:55.705976 3227 hierarchical_allocator_process.hpp:544] Recovered
> cpus=1; mem=256 (total allocatable: cpus=3; mem=1024; ports=[31000-32000];
> disk=14036) on slave 201303212049-1015726915-33834-3147-0 from framework
> 201303212049-1015726915-33834-3147-0000
> I0321 20:49:55.700172 3223 slave.cpp:901] Asked to shut down framework
> 201303212049-1015726915-33834-3147-0000 by [email protected]:33834
> I0321 20:49:55.706676 3223 slave.cpp:906] Shutting down framework
> 201303212049-1015726915-33834-3147-0000
> I0321 20:49:55.707144 3223 slave.cpp:1693] Shutting down executor 'default'
> of framework 201303212049-1015726915-33834-3147-0000
> I0321 20:49:55.707645 3223 slave.cpp:386] Slave terminating
> I0321 20:49:55.708091 3223 slave.cpp:901] Asked to shut down framework
> 201303212049-1015726915-33834-3147-0000 by @0.0.0.0:0
> I0321 20:49:55.708566 3223 slave.cpp:906] Shutting down framework
> 201303212049-1015726915-33834-3147-0000
> I0321 20:49:55.709998 3223 slave.cpp:1693] Shutting down executor 'default'
> of framework 201303212049-1015726915-33834-3147-0000
> I0321 20:49:55.710511 3223 master.cpp:537] Slave
> 201303212049-1015726915-33834-3147-0(janus.apache.org) disconnected
> I0321 20:49:55.710978 3223 master.cpp:542] Removing disconnected slave
> 201303212049-1015726915-33834-3147-0(janus.apache.org) because it is not
> checkpointing!
> I0321 20:49:55.707731 3228 status_update_manager.cpp:233] Closing status
> update streams for framework 201303212049-1015726915-33834-3147-0000
> I0321 20:49:55.712024 3228 status_update_manager.cpp:233] Closing status
> update streams for framework 201303212049-1015726915-33834-3147-0000
> I0321 20:49:55.711609 3147 master.cpp:477] Master terminating
> ../../src/tests/allocator_tests.cpp:642: Failure
> Actual function call count doesn't match EXPECT_CALL(exec, shutdown(_))...
> Expected: to be called once
> Actual: never called - unsatisfied and active
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira