[
https://issues.apache.org/jira/browse/MESOS-7519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16246771#comment-16246771
]
Alexander Rukletsov commented on MESOS-7519:
--------------------------------------------
It looks like review https://reviews.apache.org/r/55893/ does not really fix
what it aims to fix. The problem is in [this
loop|https://github.com/apache/mesos/blob/master/src/tests/oversubscription_tests.cpp?utf8=%E2%9C%93#L549-L552]:
{{offers.get()}} removes items from the collection we iterate over and hence
defeats the purpose of merging resources from all offers.
> OversubscriptionTest.RescindRevocableOfferWithIncreasedRevocable is flaky
> -------------------------------------------------------------------------
>
> Key: MESOS-7519
> URL: https://issues.apache.org/jira/browse/MESOS-7519
> Project: Mesos
> Issue Type: Bug
> Reporter: Neil Conway
> Labels: flaky-test, mesosphere
> Attachments: RescindRevocableOfferWithIncreasedRevocable-badrun.txt
>
>
> {noformat}
> [ RUN ] OversubscriptionTest.RescindRevocableOfferWithIncreasedRevocable
> I0517 10:43:58.154139 2927604672 cluster.cpp:162] Creating default 'local'
> authorizer
> I0517 10:43:58.155712 260517888 master.cpp:436] Master
> a70cd84f-96ed-417f-8285-04416cf4ecb5 (neils-macbook-pro.local) started on
> 169.254.161.216:51870
> I0517 10:43:58.155740 260517888 master.cpp:438] Flags at startup: --acls=""
> --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins"
> --allocation_interval="1secs" --allocator="HierarchicalDRF"
> --authenticate_agents="true" --authenticate_frameworks="true"
> --authenticate_http_frameworks="true" --authenticate_http_readonly="true"
> --authenticate_http_readwrite="true" --authenticators="crammd5"
> --authorizers="local"
> --credentials="/private/var/folders/g7/cj4h93hx15d_5195_2436lc00000gn/T/C5v4kE/credentials"
> --framework_sorter="drf" --help="false" --hostname_lookup="true"
> --http_authenticators="basic" --http_framework_authenticators="basic"
> --initialize_driver_logging="true" --log_auto_initialize="true"
> --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5"
> --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000"
> --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false"
> --recovery_agent_removal_limit="100%" --registry="in_memory"
> --registry_fetch_timeout="1mins" --registry_gc_interval="15mins"
> --registry_max_agent_age="2weeks" --registry_max_agent_count="102400"
> --registry_store_timeout="100secs" --registry_strict="false"
> --root_submissions="true" --user_sorter="drf" --version="false"
> --webui_dir="/usr/local/share/mesos/webui"
> --work_dir="/private/var/folders/g7/cj4h93hx15d_5195_2436lc00000gn/T/C5v4kE/master"
> --zk_session_timeout="10secs"
> I0517 10:43:58.155948 260517888 master.cpp:488] Master only allowing
> authenticated frameworks to register
> I0517 10:43:58.155958 260517888 master.cpp:502] Master only allowing
> authenticated agents to register
> I0517 10:43:58.155963 260517888 master.cpp:515] Master only allowing
> authenticated HTTP frameworks to register
> I0517 10:43:58.155968 260517888 credentials.hpp:37] Loading credentials for
> authentication from
> '/private/var/folders/g7/cj4h93hx15d_5195_2436lc00000gn/T/C5v4kE/credentials'
> I0517 10:43:58.156102 260517888 master.cpp:560] Using default 'crammd5'
> authenticator
> I0517 10:43:58.156154 260517888 http.cpp:975] Creating default 'basic' HTTP
> authenticator for realm 'mesos-master-readonly'
> I0517 10:43:58.156276 260517888 http.cpp:975] Creating default 'basic' HTTP
> authenticator for realm 'mesos-master-readwrite'
> I0517 10:43:58.156409 260517888 http.cpp:975] Creating default 'basic' HTTP
> authenticator for realm 'mesos-master-scheduler'
> I0517 10:43:58.156517 260517888 master.cpp:640] Authorization enabled
> I0517 10:43:58.157871 263200768 master.cpp:2161] Elected as the leading
> master!
> I0517 10:43:58.157883 263200768 master.cpp:1700] Recovering from registrar
> I0517 10:43:58.158254 261591040 registrar.cpp:389] Successfully fetched the
> registry (0B) in 0ns
> I0517 10:43:58.158299 261591040 registrar.cpp:493] Applied 1 operations in
> 14us; attempting to update the registry
> I0517 10:43:58.158640 261591040 registrar.cpp:550] Successfully updated the
> registry in 0ns
> I0517 10:43:58.158766 261591040 registrar.cpp:422] Successfully recovered
> registrar
> I0517 10:43:58.158968 259444736 master.cpp:1799] Recovered 0 agents from the
> registry (164B); allowing 10mins for agents to re-register
> I0517 10:43:58.162422 2927604672 containerizer.cpp:221] Using isolation:
> posix/cpu,posix/mem,filesystem/posix
> I0517 10:43:58.162828 2927604672 provisioner.cpp:249] Using default backend
> 'copy'
> I0517 10:43:58.163873 2927604672 cluster.cpp:448] Creating default 'local'
> authorizer
> I0517 10:43:58.164876 262127616 slave.cpp:225] Mesos agent started on
> (7)@169.254.161.216:51870
> I0517 10:43:58.164902 262127616 slave.cpp:226] Flags at startup: --acls=""
> --appc_simple_discovery_uri_prefix="http://"
> --appc_store_dir="/var/folders/g7/cj4h93hx15d_5195_2436lc00000gn/T/mesos/store/appc"
> --authenticate_http_readonly="true" --authenticate_http_readwrite="true"
> --authenticatee="crammd5" --authentication_backoff_factor="1secs"
> --authorizer="local" --container_disk_watch_interval="15secs"
> --containerizers="mesos"
> --credential="/var/folders/g7/cj4h93hx15d_5195_2436lc00000gn/T/OversubscriptionTest_RescindRevocableOfferWithIncreasedRevocable_TTBI1M/credential"
> --default_role="*" --disk_watch_interval="1mins" --docker="docker"
> --docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io"
> --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock"
> --docker_stop_timeout="0ns"
> --docker_store_dir="/var/folders/g7/cj4h93hx15d_5195_2436lc00000gn/T/mesos/store/docker"
> --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume"
> --enforce_container_disk_quota="false"
> --executor_registration_timeout="1mins"
> --executor_shutdown_grace_period="5secs"
> --fetcher_cache_dir="/var/folders/g7/cj4h93hx15d_5195_2436lc00000gn/T/OversubscriptionTest_RescindRevocableOfferWithIncreasedRevocable_TTBI1M/fetch"
> --fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks"
> --gc_disk_headroom="0.1" --hadoop_home="" --help="false"
> --hostname_lookup="true" --http_command_executor="false"
> --http_credentials="/var/folders/g7/cj4h93hx15d_5195_2436lc00000gn/T/OversubscriptionTest_RescindRevocableOfferWithIncreasedRevocable_TTBI1M/http_credentials"
> --http_heartbeat_interval="30secs" --initialize_driver_logging="true"
> --isolation="posix/cpu,posix/mem" --launcher="posix"
> --launcher_dir="/Users/neilc/ms/build-mesos/src" --logbufsecs="0"
> --logging_level="INFO" --max_completed_executors_per_framework="150"
> --oversubscribed_resources_interval="15secs" --port="5051"
> --qos_correction_interval_min="0ns" --quiet="false" --recover="reconnect"
> --recovery_timeout="15mins" --registration_backoff_factor="10ms"
> --resources="cpus:2;gpus:0;mem:1024;disk:1024;ports:[31000-32000]"
> --runtime_dir="/var/folders/g7/cj4h93hx15d_5195_2436lc00000gn/T/OversubscriptionTest_RescindRevocableOfferWithIncreasedRevocable_TTBI1M"
> --sandbox_directory="/mnt/mesos/sandbox" --strict="true"
> --switch_user="true" --version="false"
> --work_dir="/var/folders/g7/cj4h93hx15d_5195_2436lc00000gn/T/OversubscriptionTest_RescindRevocableOfferWithIncreasedRevocable_MuoZUz"
> I0517 10:43:58.165099 262127616 credentials.hpp:86] Loading credential for
> authentication from
> '/var/folders/g7/cj4h93hx15d_5195_2436lc00000gn/T/OversubscriptionTest_RescindRevocableOfferWithIncreasedRevocable_TTBI1M/credential'
> I0517 10:43:58.165222 262127616 slave.cpp:258] Agent using credential for:
> test-principal
> I0517 10:43:58.165240 262127616 credentials.hpp:37] Loading credentials for
> authentication from
> '/var/folders/g7/cj4h93hx15d_5195_2436lc00000gn/T/OversubscriptionTest_RescindRevocableOfferWithIncreasedRevocable_TTBI1M/http_credentials'
> I0517 10:43:58.165406 262127616 http.cpp:975] Creating default 'basic' HTTP
> authenticator for realm 'mesos-agent-readonly'
> I0517 10:43:58.165530 262127616 http.cpp:975] Creating default 'basic' HTTP
> authenticator for realm 'mesos-agent-readwrite'
> I0517 10:43:58.166236 262127616 slave.cpp:529] Agent resources: cpus(*):2;
> mem(*):1024; disk(*):1024; ports(*):[31000-32000]
> I0517 10:43:58.166291 262127616 slave.cpp:537] Agent attributes: [ ]
> I0517 10:43:58.166301 262127616 slave.cpp:542] Agent hostname:
> neils-macbook-pro.local
> I0517 10:43:58.166380 261054464 status_update_manager.cpp:177] Pausing
> sending status updates
> I0517 10:43:58.166661 2927604672 sched.cpp:232] Version: 1.4.0
> I0517 10:43:58.166970 259981312 sched.cpp:336] New master detected at
> [email protected]:51870
> I0517 10:43:58.166982 263200768 state.cpp:62] Recovering state from
> '/var/folders/g7/cj4h93hx15d_5195_2436lc00000gn/T/OversubscriptionTest_RescindRevocableOfferWithIncreasedRevocable_MuoZUz/meta'
> I0517 10:43:58.167007 259981312 sched.cpp:407] Authenticating with master
> [email protected]:51870
> I0517 10:43:58.167017 259981312 sched.cpp:414] Using default CRAM-MD5
> authenticatee
> I0517 10:43:58.167127 259444736 authenticatee.cpp:121] Creating new client
> SASL connection
> I0517 10:43:58.167276 261054464 status_update_manager.cpp:203] Recovering
> status update manager
> I0517 10:43:58.167337 260517888 master.cpp:7475] Authenticating
> [email protected]:51870
> I0517 10:43:58.167469 259444736 containerizer.cpp:608] Recovering
> containerizer
> I0517 10:43:58.167549 263200768 authenticator.cpp:98] Creating new server
> SASL connection
> I0517 10:43:58.167659 259981312 authenticatee.cpp:213] Received SASL
> authentication mechanisms: CRAM-MD5
> I0517 10:43:58.167681 259981312 authenticatee.cpp:239] Attempting to
> authenticate with mechanism 'CRAM-MD5'
> I0517 10:43:58.167770 261591040 authenticator.cpp:204] Received SASL
> authentication start
> I0517 10:43:58.167809 261591040 authenticator.cpp:326] Authentication
> requires more steps
> I0517 10:43:58.167899 262127616 authenticatee.cpp:259] Received SASL
> authentication step
> I0517 10:43:58.168004 262664192 authenticator.cpp:232] Received SASL
> authentication step
> I0517 10:43:58.168036 262664192 authenticator.cpp:318] Authentication success
> I0517 10:43:58.168140 259444736 authenticatee.cpp:299] Authentication success
> I0517 10:43:58.168159 260517888 master.cpp:7505] Successfully authenticated
> principal 'test-principal' at
> [email protected]:51870
> I0517 10:43:58.168407 263200768 sched.cpp:513] Successfully authenticated
> with master [email protected]:51870
> I0517 10:43:58.168596 261054464 master.cpp:2813] Received SUBSCRIBE call for
> framework 'default' at
> [email protected]:51870
> I0517 10:43:58.168611 261054464 master.cpp:2197] Authorizing framework
> principal 'test-principal' to receive offers for roles '{ * }'
> I0517 10:43:58.168609 261591040 provisioner.cpp:410] Provisioner recovery
> complete
> I0517 10:43:58.168901 259981312 master.cpp:2813] Received SUBSCRIBE call for
> framework 'default' at
> [email protected]:51870
> I0517 10:43:58.168915 259981312 master.cpp:2197] Authorizing framework
> principal 'test-principal' to receive offers for roles '{ * }'
> I0517 10:43:58.168923 261054464 slave.cpp:5974] Finished recovery
> I0517 10:43:58.169025 259981312 master.cpp:2890] Subscribing framework
> default with checkpointing disabled and capabilities [ REVOCABLE_RESOURCES ]
> I0517 10:43:58.169322 262664192 hierarchical.cpp:273] Added framework
> a70cd84f-96ed-417f-8285-04416cf4ecb5-0000
> I0517 10:43:58.169355 261591040 sched.cpp:759] Framework registered with
> a70cd84f-96ed-417f-8285-04416cf4ecb5-0000
> I0517 10:43:58.169375 259981312 master.cpp:2890] Subscribing framework
> default with checkpointing disabled and capabilities [ REVOCABLE_RESOURCES ]
> I0517 10:43:58.169392 259981312 master.cpp:2900] Framework
> a70cd84f-96ed-417f-8285-04416cf4ecb5-0000 (default) at
> [email protected]:51870 already
> subscribed, resending acknowledgement
> I0517 10:43:58.169566 261054464 status_update_manager.cpp:177] Pausing
> sending status updates
> I0517 10:43:58.169572 260517888 slave.cpp:922] New master detected at
> [email protected]:51870
> I0517 10:43:58.169620 260517888 slave.cpp:957] Detecting new master
> I0517 10:43:58.169721 260517888 slave.cpp:984] Authenticating with master
> [email protected]:51870
> I0517 10:43:58.169747 260517888 slave.cpp:995] Using default CRAM-MD5
> authenticatee
> I0517 10:43:58.169876 259981312 authenticatee.cpp:121] Creating new client
> SASL connection
> I0517 10:43:58.169983 262664192 master.cpp:7475] Authenticating
> slave(7)@169.254.161.216:51870
> I0517 10:43:58.170220 260517888 authenticator.cpp:98] Creating new server
> SASL connection
> I0517 10:43:58.170321 261054464 authenticatee.cpp:213] Received SASL
> authentication mechanisms: CRAM-MD5
> I0517 10:43:58.170338 261054464 authenticatee.cpp:239] Attempting to
> authenticate with mechanism 'CRAM-MD5'
> I0517 10:43:58.170450 262664192 authenticator.cpp:204] Received SASL
> authentication start
> I0517 10:43:58.170491 262664192 authenticator.cpp:326] Authentication
> requires more steps
> I0517 10:43:58.170567 262664192 authenticatee.cpp:259] Received SASL
> authentication step
> I0517 10:43:58.170634 260517888 authenticator.cpp:232] Received SASL
> authentication step
> I0517 10:43:58.170662 260517888 authenticator.cpp:318] Authentication success
> I0517 10:43:58.170758 261591040 authenticatee.cpp:299] Authentication success
> I0517 10:43:58.170771 261054464 master.cpp:7505] Successfully authenticated
> principal 'test-principal' at slave(7)@169.254.161.216:51870
> I0517 10:43:58.171092 262127616 slave.cpp:1079] Successfully authenticated
> with master [email protected]:51870
> I0517 10:43:58.171214 259981312 master.cpp:5429] Received register agent
> message from slave(7)@169.254.161.216:51870 (neils-macbook-pro.local)
> I0517 10:43:58.171231 259981312 master.cpp:3659] Authorizing agent with
> principal 'test-principal'
> I0517 10:43:58.171458 263200768 master.cpp:5564] Registering agent at
> slave(7)@169.254.161.216:51870 (neils-macbook-pro.local) with id
> a70cd84f-96ed-417f-8285-04416cf4ecb5-S0
> I0517 10:43:58.171792 261054464 registrar.cpp:493] Applied 1 operations in
> 81us; attempting to update the registry
> I0517 10:43:58.172471 261054464 registrar.cpp:550] Successfully updated the
> registry in 0ns
> I0517 10:43:58.172878 259444736 master.cpp:5639] Registered agent
> a70cd84f-96ed-417f-8285-04416cf4ecb5-S0 at slave(7)@169.254.161.216:51870
> (neils-macbook-pro.local) with cpus(*):2; mem(*):1024; disk(*):1024;
> ports(*):[31000-32000]
> I0517 10:43:58.172924 262664192 slave.cpp:1125] Registered with master
> [email protected]:51870; given agent ID
> a70cd84f-96ed-417f-8285-04416cf4ecb5-S0
> I0517 10:43:58.173074 261054464 status_update_manager.cpp:184] Resuming
> sending status updates
> I0517 10:43:58.173394 259981312 hierarchical.cpp:525] Added agent
> a70cd84f-96ed-417f-8285-04416cf4ecb5-S0 (neils-macbook-pro.local) with
> cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] (allocated: {})
> I0517 10:43:58.174643 261591040 master.cpp:7305] Sending 1 offers to
> framework a70cd84f-96ed-417f-8285-04416cf4ecb5-0000 (default) at
> [email protected]:51870
> I0517 10:43:58.175052 259444736 slave.cpp:6205] Forwarding total
> oversubscribed resources cpus(*){REV}:1
> I0517 10:43:58.175165 259981312 master.cpp:6324] Received update of agent
> a70cd84f-96ed-417f-8285-04416cf4ecb5-S0 at slave(7)@169.254.161.216:51870
> (neils-macbook-pro.local) with total oversubscribed resources cpus(*){REV}:1
> I0517 10:43:58.175364 262664192 hierarchical.cpp:620] Agent
> a70cd84f-96ed-417f-8285-04416cf4ecb5-S0 (neils-macbook-pro.local) updated
> with oversubscribed resources cpus(*){REV}:1 (total: cpus(*):2; mem(*):1024;
> disk(*):1024; ports(*):[31000-32000]; cpus(*){REV}:1, allocated:
> cpus(*)(allocated: *):2; mem(*)(allocated: *):1024; disk(*)(allocated:
> *):1024; ports(*)(allocated: *):[31000-32000])
> I0517 10:43:58.175796 261054464 master.cpp:7305] Sending 1 offers to
> framework a70cd84f-96ed-417f-8285-04416cf4ecb5-0000 (default) at
> [email protected]:51870
> I0517 10:43:58.176827 259444736 slave.cpp:6205] Forwarding total
> oversubscribed resources cpus(*){REV}:3
> I0517 10:43:58.177014 260517888 master.cpp:6324] Received update of agent
> a70cd84f-96ed-417f-8285-04416cf4ecb5-S0 at slave(7)@169.254.161.216:51870
> (neils-macbook-pro.local) with total oversubscribed resources cpus(*){REV}:3
> I0517 10:43:58.177314 263200768 hierarchical.cpp:620] Agent
> a70cd84f-96ed-417f-8285-04416cf4ecb5-S0 (neils-macbook-pro.local) updated
> with oversubscribed resources cpus(*){REV}:3 (total: cpus(*):2; mem(*):1024;
> disk(*):1024; ports(*):[31000-32000]; cpus(*){REV}:3, allocated:
> cpus(*)(allocated: *):2; mem(*)(allocated: *):1024; disk(*)(allocated:
> *):1024; ports(*)(allocated: *):[31000-32000]; cpus(*)(allocated: *){REV}:1)
> I0517 10:43:58.177306 260517888 master.cpp:6344] Removing offer
> a70cd84f-96ed-417f-8285-04416cf4ecb5-O1 with revocable resources
> cpus(*)(allocated: *){REV}:1 on agent a70cd84f-96ed-417f-8285-04416cf4ecb5-S0
> at slave(7)@169.254.161.216:51870 (neils-macbook-pro.local)
> I0517 10:43:58.177935 260517888 master.cpp:7305] Sending 1 offers to
> framework a70cd84f-96ed-417f-8285-04416cf4ecb5-0000 (default) at
> [email protected]:51870
> I0517 10:43:58.178577 259444736 master.cpp:7305] Sending 1 offers to
> framework a70cd84f-96ed-417f-8285-04416cf4ecb5-0000 (default) at
> [email protected]:51870
> ../../mesos/src/tests/oversubscription_tests.cpp:552: Failure
> Expected: allocatedResources(resources2, framework.role())
> Which is: { cpus(*)(allocated: *){REV}:3 }
> To be equal to: resources3
> Which is: { cpus(*)(allocated: *){REV}:2 }
> *** Aborted at 1495043038 (unix time) try "date -d @1495043038" if you are
> using GNU date ***
> PC: @ 0x109064010 testing::UnitTest::AddTestPartResult()
> *** SIGSEGV (@0x0) received by PID 68930 (TID 0x7fffae7fb3c0) stack trace: ***
> @ 0x7fffa5af0b3a _sigtramp
> @ 0x10 (unknown)
> @ 0x109063817 testing::internal::AssertHelper::operator=()
> @ 0x108431e2f
> mesos::internal::tests::OversubscriptionTest_RescindRevocableOfferWithIncreasedRevocable_Test::TestBody()
> @ 0x1090d2a0a
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @ 0x10907b697
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @ 0x10907b5d5 testing::Test::Run()
> @ 0x10907d328 testing::TestInfo::Run()
> @ 0x10907e897 testing::TestCase::Run()
> @ 0x10908e85c testing::internal::UnitTestImpl::RunAllTests()
> @ 0x1090d4cca
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @ 0x10908e287
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @ 0x10908e158 testing::UnitTest::Run()
> @ 0x107e80971 RUN_ALL_TESTS()
> @ 0x107e7c4b1 main
> @ 0x7fffa58e1235 start
> @ 0x5 (unknown)
> zsh: segmentation fault ./src/mesos-tests --gtest_repeat=1000
> --gtest_break_on_failure --verbose
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)