[ 
https://issues.apache.org/jira/browse/MESOS-410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13609628#comment-13609628
 ] 

Thomas Marshall commented on MESOS-410:
---------------------------------------

I can't actually reproduce this (although it clearly is a real issue since two 
of the Jenkins builds both hit this problem) nor can I find any other issues 
with the allocator tests after running them all ~1000 on several different 
configurations. However, that doesn't convince me that there won't be more 
problems in the future, and now that I have a better idea of the kinds of 
things that make these tests fragile, I'll be posting a review in the next week 
or so with some changes to make all of the allocator tests more robust so we 
don't have to deal with these things one by one.
                
> AllocatorTest/0.SchedulerFailover is flaky.
> -------------------------------------------
>
>                 Key: MESOS-410
>                 URL: https://issues.apache.org/jira/browse/MESOS-410
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Benjamin Mahler
>            Assignee: Thomas Marshall
>
> Looks like there are some bad expectations in the SchedulerFailover test.
> [ RUN      ] AllocatorTest/0.SchedulerFailover
> I0321 20:49:55.008021  3226 master.cpp:309] Master started on 
> 67.195.138.60:33834
> I0321 20:49:55.008059  3226 master.cpp:324] Master ID: 
> 201303212049-1015726915-33834-3147
> I0321 20:49:55.008726  3226 master.cpp:603] Elected as master!
> I0321 20:49:55.008412  3228 sched.cpp:182] New master at 
> [email protected]:33834
> I0321 20:49:55.009618  3228 master.cpp:646] Registering framework 
> 201303212049-1015726915-33834-3147-0000 at scheduler(63)@67.195.138.60:33834
> I0321 20:49:55.010125  3228 sched.cpp:217] Framework registered with 
> 201303212049-1015726915-33834-3147-0000
> W0321 20:49:55.008816  3229 master.cpp:81] No whitelist given. Advertising 
> offers for all slaves
> I0321 20:49:55.008859  3230 hierarchical_allocator_process.hpp:236] 
> Initializing hierarchical allocator process with master : 
> [email protected]:33834
> I0321 20:49:55.060097  3230 hierarchical_allocator_process.hpp:268] Added 
> framework 201303212049-1015726915-33834-3147-0000
> I0321 20:49:55.065731  3230 hierarchical_allocator_process.hpp:668] No 
> resources available to allocate!
> I0321 20:49:55.066246  3230 hierarchical_allocator_process.hpp:599] Performed 
> allocation for 0 slaves in 517.48us
> I0321 20:49:55.008122  3224 slave.cpp:203] Slave started on 
> 73)@67.195.138.60:33834
> I0321 20:49:55.067206  3224 slave.cpp:204] Slave resources: cpus=3; mem=1024; 
> ports=[31000-32000]; disk=14036
> I0321 20:49:55.068222  3224 slave.cpp:453] New master detected at 
> [email protected]:33834
> I0321 20:49:55.068403  3224 slave.cpp:377] Finished recovery
> I0321 20:49:55.069236  3224 master.cpp:968] Attempting to register slave on 
> janus.apache.org at slave(73)@67.195.138.60:33834
> I0321 20:49:55.070163  3224 master.cpp:1224] Master now considering a slave 
> at janus.apache.org:33834 as active
> I0321 20:49:55.070575  3224 master.cpp:1862] Adding slave 
> 201303212049-1015726915-33834-3147-0 at janus.apache.org with cpus=3; 
> mem=1024; ports=[31000-32000]; disk=14036
> I0321 20:49:55.072634  3224 slave.cpp:487] Registered with master; given 
> slave ID 201303212049-1015726915-33834-3147-0
> I0321 20:49:55.068428  3226 status_update_manager.cpp:132] New master 
> detected at [email protected]:33834
> I0321 20:49:55.072762  3228 hierarchical_allocator_process.hpp:395] Added 
> slave 201303212049-1015726915-33834-3147-0 (janus.apache.org) with cpus=3; 
> mem=1024; ports=[31000-32000]; disk=14036 (and cpus=3; mem=1024; 
> ports=[31000-32000]; disk=14036 available)
> I0321 20:49:55.074096  3228 hierarchical_allocator_process.hpp:660] Found 
> available resources: cpus=3; mem=1024; ports=[31000-32000]; disk=14036 on 
> slave 201303212049-1015726915-33834-3147-0
> I0321 20:49:55.074628  3228 hierarchical_allocator_process.hpp:686] Offering 
> cpus=3; mem=1024; ports=[31000-32000]; disk=14036 on slave 
> 201303212049-1015726915-33834-3147-0 to framework 
> 201303212049-1015726915-33834-3147-0000
> I0321 20:49:55.075227  3228 hierarchical_allocator_process.hpp:619] Performed 
> allocation for slave 201303212049-1015726915-33834-3147-0 in 1.14ms
> I0321 20:49:55.075314  3224 master.hpp:309] Adding offer with resources 
> cpus=3; mem=1024; ports=[31000-32000]; disk=14036 on slave 
> 201303212049-1015726915-33834-3147-0
> I0321 20:49:55.076213  3224 master.cpp:1327] Sending 1 offers to framework 
> 201303212049-1015726915-33834-3147-0000
> I0321 20:49:55.076750  3224 sched.cpp:282] Received 1 offers
> I0321 20:49:55.077636  3224 master.cpp:1534] Processing reply for offer 
> 201303212049-1015726915-33834-3147-0 on slave 
> 201303212049-1015726915-33834-3147-0 (janus.apache.org) for framework 
> 201303212049-1015726915-33834-3147-0000
> I0321 20:49:55.077834  3224 master.hpp:289] Adding task with resources 
> cpus=1; mem=256 on slave 201303212049-1015726915-33834-3147-0
> I0321 20:49:55.078308  3224 master.cpp:1651] Launching task 0 of framework 
> 201303212049-1015726915-33834-3147-0000 with resources cpus=1; mem=256 on 
> slave 201303212049-1015726915-33834-3147-0 (janus.apache.org)
> I0321 20:49:55.078922  3224 master.hpp:318] Removing offer with resources 
> cpus=3; mem=1024; ports=[31000-32000]; disk=14036 on slave 
> 201303212049-1015726915-33834-3147-0
> I0321 20:49:55.078999  3229 slave.cpp:599] Got assigned task 0 for framework 
> 201303212049-1015726915-33834-3147-0000
> I0321 20:49:55.079066  3230 hierarchical_allocator_process.hpp:471] Framework 
> 201303212049-1015726915-33834-3147-0000 left cpus=2; mem=768; 
> ports=[31000-32000]; disk=14036 unused on slave 
> 201303212049-1015726915-33834-3147-0
> I0321 20:49:55.081300  3229 paths.hpp:302] Created executor directory 
> '/tmp/AllocatorTest_0_SchedulerFailover_oWPFSh/slaves/201303212049-1015726915-33834-3147-0/frameworks/201303212049-1015726915-33834-3147-0000/executors/default/runs/da91e171-2172-4e06-b425-1aa1e8522aa7'
> I0321 20:49:55.122453  3229 slave.cpp:436] Successfully attached file 
> '/tmp/AllocatorTest_0_SchedulerFailover_oWPFSh/slaves/201303212049-1015726915-33834-3147-0/frameworks/201303212049-1015726915-33834-3147-0000/executors/default/runs/da91e171-2172-4e06-b425-1aa1e8522aa7'
> I0321 20:49:55.122535  3225 exec.cpp:170] Executor started at: 
> executor(20)@67.195.138.60:33834 with pid 3147
> I0321 20:49:55.123455  3225 slave.cpp:1058] Got registration for executor 
> 'default' of framework 201303212049-1015726915-33834-3147-0000
> I0321 20:49:55.124008  3225 slave.cpp:1133] Flushing queued tasks for 
> framework 201303212049-1015726915-33834-3147-0000
> I0321 20:49:55.124099  3229 exec.cpp:194] Executor registered on slave 
> 201303212049-1015726915-33834-3147-0
> I0321 20:49:55.125076  3229 exec.cpp:258] Executor asked to run task '0'
> I0321 20:49:55.125663  3227 sched.cpp:422] Stopping framework 
> '201303212049-1015726915-33834-3147-0000'
> I0321 20:49:55.126713  3227 master.cpp:488] Framework 
> 201303212049-1015726915-33834-3147-0000 disconnected
> I0321 20:49:55.127213  3227 master.cpp:500] Giving framework 
> 201303212049-1015726915-33834-3147-0000 500.00ms to failover
> I0321 20:49:55.127316  3223 hierarchical_allocator_process.hpp:359] 
> Deactivated framework 201303212049-1015726915-33834-3147-0000
> I0321 20:49:55.127428  3226 sched.cpp:182] New master at 
> [email protected]:33834
> I0321 20:49:55.128818  3226 master.cpp:681] Re-registering framework 
> 201303212049-1015726915-33834-3147-0000 at scheduler(64)@67.195.138.60:33834
> I0321 20:49:55.162433  3226 master.cpp:700] Framework 
> 201303212049-1015726915-33834-3147-0000 failed over
> I0321 20:49:55.163038  3226 sched.cpp:217] Framework registered with 
> 201303212049-1015726915-33834-3147-0000
> I0321 20:49:55.163110  3228 slave.cpp:968] Updating framework 
> 201303212049-1015726915-33834-3147-0000 pid to 
> scheduler(64)@67.195.138.60:33834
> I0321 20:49:55.163147  3224 hierarchical_allocator_process.hpp:327] Activated 
> framework 201303212049-1015726915-33834-3147-0000
> I0321 20:49:55.164475  3224 hierarchical_allocator_process.hpp:660] Found 
> available resources: cpus=2; mem=768; ports=[31000-32000]; disk=14036 on 
> slave 201303212049-1015726915-33834-3147-0
> I0321 20:49:55.164984  3224 hierarchical_allocator_process.hpp:686] Offering 
> cpus=2; mem=768; ports=[31000-32000]; disk=14036 on slave 
> 201303212049-1015726915-33834-3147-0 to framework 
> 201303212049-1015726915-33834-3147-0000
> I0321 20:49:55.165602  3224 hierarchical_allocator_process.hpp:599] Performed 
> allocation for 1 slaves in 1.15ms
> I0321 20:49:55.165699  3230 master.hpp:309] Adding offer with resources 
> cpus=2; mem=768; ports=[31000-32000]; disk=14036 on slave 
> 201303212049-1015726915-33834-3147-0
> I0321 20:49:55.166571  3230 master.cpp:1327] Sending 1 offers to framework 
> 201303212049-1015726915-33834-3147-0000
> I0321 20:49:55.167109  3230 sched.cpp:282] Received 1 offers
> I0321 20:49:55.167677  3225 sched.cpp:422] Stopping framework 
> '201303212049-1015726915-33834-3147-0000'
> I0321 20:49:55.168128  3225 master.cpp:488] Framework 
> 201303212049-1015726915-33834-3147-0000 disconnected
> I0321 20:49:55.168639  3225 master.cpp:500] Giving framework 
> 201303212049-1015726915-33834-3147-0000 500.00ms to failover
> I0321 20:49:55.178027  3225 master.hpp:318] Removing offer with resources 
> cpus=2; mem=768; ports=[31000-32000]; disk=14036 on slave 
> 201303212049-1015726915-33834-3147-0
> I0321 20:49:55.168725  3229 hierarchical_allocator_process.hpp:359] 
> Deactivated framework 201303212049-1015726915-33834-3147-0000
> I0321 20:49:55.199956  3229 hierarchical_allocator_process.hpp:544] Recovered 
> cpus=2; mem=768; ports=[31000-32000]; disk=14036 (total allocatable: cpus=2; 
> mem=768; ports=[31000-32000]; disk=14036) on slave 
> 201303212049-1015726915-33834-3147-0 from framework 
> 201303212049-1015726915-33834-3147-0000
> I0321 20:49:55.678827  3227 master.cpp:1259] Framework failover timeout, 
> removing framework 201303212049-1015726915-33834-3147-0000
> I0321 20:49:55.699990  3227 master.hpp:300] Removing task with resources 
> cpus=1; mem=256 on slave 201303212049-1015726915-33834-3147-0
> I0321 20:49:55.705976  3227 hierarchical_allocator_process.hpp:544] Recovered 
> cpus=1; mem=256 (total allocatable: cpus=3; mem=1024; ports=[31000-32000]; 
> disk=14036) on slave 201303212049-1015726915-33834-3147-0 from framework 
> 201303212049-1015726915-33834-3147-0000
> I0321 20:49:55.700172  3223 slave.cpp:901] Asked to shut down framework 
> 201303212049-1015726915-33834-3147-0000 by [email protected]:33834
> I0321 20:49:55.706676  3223 slave.cpp:906] Shutting down framework 
> 201303212049-1015726915-33834-3147-0000
> I0321 20:49:55.707144  3223 slave.cpp:1693] Shutting down executor 'default' 
> of framework 201303212049-1015726915-33834-3147-0000
> I0321 20:49:55.707645  3223 slave.cpp:386] Slave terminating
> I0321 20:49:55.708091  3223 slave.cpp:901] Asked to shut down framework 
> 201303212049-1015726915-33834-3147-0000 by @0.0.0.0:0
> I0321 20:49:55.708566  3223 slave.cpp:906] Shutting down framework 
> 201303212049-1015726915-33834-3147-0000
> I0321 20:49:55.709998  3223 slave.cpp:1693] Shutting down executor 'default' 
> of framework 201303212049-1015726915-33834-3147-0000
> I0321 20:49:55.710511  3223 master.cpp:537] Slave 
> 201303212049-1015726915-33834-3147-0(janus.apache.org) disconnected
> I0321 20:49:55.710978  3223 master.cpp:542] Removing disconnected slave 
> 201303212049-1015726915-33834-3147-0(janus.apache.org) because it is not 
> checkpointing!
> I0321 20:49:55.707731  3228 status_update_manager.cpp:233] Closing status 
> update streams for framework 201303212049-1015726915-33834-3147-0000
> I0321 20:49:55.712024  3228 status_update_manager.cpp:233] Closing status 
> update streams for framework 201303212049-1015726915-33834-3147-0000
> I0321 20:49:55.711609  3147 master.cpp:477] Master terminating
> ../../src/tests/allocator_tests.cpp:642: Failure
> Actual function call count doesn't match EXPECT_CALL(exec, shutdown(_))...
>          Expected: to be called once
>            Actual: never called - unsatisfied and active

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to