[ 
https://issues.apache.org/jira/browse/MESOS-7516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16012893#comment-16012893
 ] 

Neil Conway commented on MESOS-7516:
------------------------------------

cc [~flx42]

> HookTest.VerifySlaveResourcesAndAttributesDecorator is flaky
> ------------------------------------------------------------
>
>                 Key: MESOS-7516
>                 URL: https://issues.apache.org/jira/browse/MESOS-7516
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Neil Conway
>              Labels: mesosphere
>
> Takes a few hundred iterations to repro, but does repro consistently:
> {noformat}
> [ RUN      ] HookTest.VerifySlaveResourcesAndAttributesDecorator
> I0516 11:32:43.248517 27528 cluster.cpp:162] Creating default 'local' 
> authorizer
> I0516 11:32:43.263743 27551 master.cpp:436] Master 
> e6c479e5-b7e6-439e-a7ad-018faf297fad (core-dev) started on 10.0.49.2:33039
> I0516 11:32:43.263772 27551 master.cpp:438] Flags at startup: --acls="" 
> --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate_agents="true" --authenticate_frameworks="true" 
> --authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
> --authenticate_http_readwrite="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/McnBom/credentials" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --http_authenticators="basic" --http_framework_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
> --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
> --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" 
> --recovery_agent_removal_limit="100%" --registry="in_memory" 
> --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
> --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
> --registry_store_timeout="100secs" --registry_strict="false" 
> --root_submissions="true" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/McnBom/master" 
> --zk_session_timeout="10secs"
> I0516 11:32:43.263958 27551 master.cpp:488] Master only allowing 
> authenticated frameworks to register
> I0516 11:32:43.263996 27551 master.cpp:502] Master only allowing 
> authenticated agents to register
> I0516 11:32:43.264010 27551 master.cpp:515] Master only allowing 
> authenticated HTTP frameworks to register
> I0516 11:32:43.264025 27551 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/McnBom/credentials'
> I0516 11:32:43.264264 27551 master.cpp:560] Using default 'crammd5' 
> authenticator
> I0516 11:32:43.264365 27551 http.cpp:975] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readonly'
> I0516 11:32:43.264456 27551 http.cpp:975] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readwrite'
> I0516 11:32:43.264750 27551 http.cpp:975] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-scheduler'
> I0516 11:32:43.264885 27551 master.cpp:640] Authorization enabled
> I0516 11:32:43.267530 27581 master.cpp:2161] Elected as the leading master!
> I0516 11:32:43.267578 27581 master.cpp:1700] Recovering from registrar
> I0516 11:32:43.268239 27570 registrar.cpp:389] Successfully fetched the 
> registry (0B) in 477952ns
> I0516 11:32:43.268348 27570 registrar.cpp:493] Applied 1 operations in 
> 15690ns; attempting to update the registry
> I0516 11:32:43.268817 27570 registrar.cpp:550] Successfully updated the 
> registry in 409344ns
> I0516 11:32:43.268924 27570 registrar.cpp:422] Successfully recovered 
> registrar
> I0516 11:32:43.269623 27568 master.cpp:1799] Recovered 0 agents from the 
> registry (119B); allowing 10mins for agents to re-register
> I0516 11:32:43.288718 27528 cluster.cpp:448] Creating default 'local' 
> authorizer
> I0516 11:32:43.289685 27572 slave.cpp:225] Mesos agent started on 
> (123)@10.0.49.2:33039
> I0516 11:32:43.289724 27572 slave.cpp:226] Flags at startup: --acls="" 
> --appc_simple_discovery_uri_prefix="http://"; 
> --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="true" 
> --authenticate_http_readwrite="true" --authenticatee="crammd5" 
> --authentication_backoff_factor="1secs" --authorizer="local" 
> --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" 
> --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" 
> --cgroups_root="mesos" --container_disk_watch_interval="15secs" 
> --containerizers="mesos" 
> --credential="/tmp/HookTest_VerifySlaveResourcesAndAttributesDecorator_0N95tZ/credential"
>  --default_role="*" --disk_watch_interval="1mins" --docker="docker" 
> --docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io"; 
> --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" 
> --docker_stop_timeout="0ns" --docker_store_dir="/tmp/mesos/store/docker" 
> --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" 
> --enforce_container_disk_quota="false" 
> --executor_registration_timeout="1mins" 
> --executor_shutdown_grace_period="5secs" 
> --fetcher_cache_dir="/tmp/HookTest_VerifySlaveResourcesAndAttributesDecorator_0N95tZ/fetch"
>  --fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" 
> --gc_disk_headroom="0.1" --hadoop_home="" --help="false" 
> --hostname_lookup="true" --http_command_executor="false" 
> --http_credentials="/tmp/HookTest_VerifySlaveResourcesAndAttributesDecorator_0N95tZ/http_credentials"
>  --http_heartbeat_interval="30secs" --initialize_driver_logging="true" 
> --isolation="posix/cpu,posix/mem" --launcher="posix" 
> --launcher_dir="/home/nrc/build-mesos-default-opts/src" --logbufsecs="0" 
> --logging_level="INFO" --max_completed_executors_per_framework="150" 
> --oversubscribed_resources_interval="15secs" --perf_duration="10secs" 
> --perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns" 
> --quiet="false" --recover="reconnect" --recovery_timeout="15mins" 
> --registration_backoff_factor="10ms" 
> --resources="cpus:2;gpus:0;mem:1024;disk:1024;ports:[31000-32000]" 
> --revocable_cpu_low_priority="true" 
> --runtime_dir="/tmp/HookTest_VerifySlaveResourcesAndAttributesDecorator_0N95tZ"
>  --sandbox_directory="/mnt/mesos/sandbox" --strict="true" 
> --switch_user="true" --systemd_enable_support="true" 
> --systemd_runtime_directory="/run/systemd/system" --version="false" 
> --work_dir="/tmp/HookTest_VerifySlaveResourcesAndAttributesDecorator_S5tlWF"
> I0516 11:32:43.290087 27572 credentials.hpp:86] Loading credential for 
> authentication from 
> '/tmp/HookTest_VerifySlaveResourcesAndAttributesDecorator_0N95tZ/credential'
> I0516 11:32:43.290268 27572 slave.cpp:258] Agent using credential for: 
> test-principal
> I0516 11:32:43.290307 27572 credentials.hpp:37] Loading credentials for 
> authentication from 
> '/tmp/HookTest_VerifySlaveResourcesAndAttributesDecorator_0N95tZ/http_credentials'
> I0516 11:32:43.290540 27572 http.cpp:975] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-agent-readonly'
> I0516 11:32:43.290654 27572 http.cpp:975] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-agent-readwrite'
> I0516 11:32:43.325108 27572 test_hook_module.cpp:324] Executing 
> 'slaveResourcesDecorator' hook
> I0516 11:32:43.325413 27572 slave.cpp:529] Agent resources: mem(*):1024; 
> disk(*):1024; ports(*):[31000-32000]; cpus(*):4; foo(*):{bar, baz}
> I0516 11:32:43.325495 27572 test_hook_module.cpp:346] Executing 
> 'slaveAttributesDecorator' hook
> I0516 11:32:43.325573 27572 slave.cpp:537] Agent attributes: [ rack=rack1 ]
> I0516 11:32:43.325597 27572 slave.cpp:542] Agent hostname: core-dev
> I0516 11:32:43.325727 27546 status_update_manager.cpp:177] Pausing sending 
> status updates
> I0516 11:32:43.326869 27546 state.cpp:62] Recovering state from 
> '/tmp/HookTest_VerifySlaveResourcesAndAttributesDecorator_S5tlWF/meta'
> I0516 11:32:43.327117 27588 status_update_manager.cpp:203] Recovering status 
> update manager
> I0516 11:32:43.327709 27564 slave.cpp:5974] Finished recovery
> I0516 11:32:43.328343 27564 slave.cpp:922] New master detected at 
> [email protected]:33039
> I0516 11:32:43.328361 27580 status_update_manager.cpp:177] Pausing sending 
> status updates
> I0516 11:32:43.328466 27564 slave.cpp:957] Detecting new master
> I0516 11:32:43.331831 27586 slave.cpp:984] Authenticating with master 
> [email protected]:33039
> I0516 11:32:43.331959 27586 slave.cpp:995] Using default CRAM-MD5 
> authenticatee
> I0516 11:32:43.332280 27587 authenticatee.cpp:121] Creating new client SASL 
> connection
> I0516 11:32:43.348698 27528 sched.cpp:232] Version: 1.4.0
> I0516 11:32:43.349016 27583 sched.cpp:336] New master detected at 
> [email protected]:33039
> I0516 11:32:43.349156 27583 sched.cpp:407] Authenticating with master 
> [email protected]:33039
> I0516 11:32:43.349231 27583 sched.cpp:414] Using default CRAM-MD5 
> authenticatee
> I0516 11:32:43.349472 27543 authenticatee.cpp:121] Creating new client SASL 
> connection
> I0516 11:32:43.351956 27555 master.cpp:7475] Authenticating 
> slave(123)@10.0.49.2:33039
> I0516 11:32:43.352265 27567 authenticator.cpp:98] Creating new server SASL 
> connection
> I0516 11:32:43.381402 27548 master.cpp:7475] Authenticating 
> [email protected]:33039
> I0516 11:32:43.381651 27548 authenticator.cpp:98] Creating new server SASL 
> connection
> I0516 11:32:43.397202 27577 authenticatee.cpp:213] Received SASL 
> authentication mechanisms: CRAM-MD5
> I0516 11:32:43.397243 27577 authenticatee.cpp:239] Attempting to authenticate 
> with mechanism 'CRAM-MD5'
> I0516 11:32:43.398547 27582 authenticator.cpp:204] Received SASL 
> authentication start
> I0516 11:32:43.398628 27582 authenticator.cpp:326] Authentication requires 
> more steps
> I0516 11:32:43.398751 27576 authenticatee.cpp:259] Received SASL 
> authentication step
> I0516 11:32:43.398907 27563 authenticator.cpp:232] Received SASL 
> authentication step
> I0516 11:32:43.399018 27563 authenticator.cpp:318] Authentication success
> I0516 11:32:43.399263 27548 authenticatee.cpp:299] Authentication success
> I0516 11:32:43.399318 27547 master.cpp:7505] Successfully authenticated 
> principal 'test-principal' at 
> [email protected]:33039
> I0516 11:32:43.399978 27557 sched.cpp:513] Successfully authenticated with 
> master [email protected]:33039
> I0516 11:32:43.400215 27582 master.cpp:2813] Received SUBSCRIBE call for 
> framework 'default' at 
> [email protected]:33039
> I0516 11:32:43.400296 27582 master.cpp:2197] Authorizing framework principal 
> 'test-principal' to receive offers for roles '{ * }'
> I0516 11:32:43.400681 27548 master.cpp:2890] Subscribing framework default 
> with checkpointing disabled and capabilities [  ]
> I0516 11:32:43.401257 27549 hierarchical.cpp:273] Added framework 
> e6c479e5-b7e6-439e-a7ad-018faf297fad-0000
> I0516 11:32:43.401258 27576 sched.cpp:759] Framework registered with 
> e6c479e5-b7e6-439e-a7ad-018faf297fad-0000
> W0516 11:32:48.333199 27565 slave.cpp:1098] Authentication timed out
> W0516 11:32:48.333492 27554 slave.cpp:1043] Failed to authenticate with 
> master [email protected]:33039: Authentication discarded
> W0516 11:32:48.352715 27587 master.cpp:7522] Authentication timed out
> I0516 11:32:48.726925 27582 slave.cpp:984] Authenticating with master 
> [email protected]:33039
> I0516 11:32:48.726980 27582 slave.cpp:995] Using default CRAM-MD5 
> authenticatee
> I0516 11:32:48.727138 27553 authenticatee.cpp:121] Creating new client SASL 
> connection
> I0516 11:32:48.758314 27582 master.cpp:7461] Queuing up authentication 
> request from slave(123)@10.0.49.2:33039 because authentication is still in 
> progress
> W0516 11:32:53.439565 27543 master.cpp:7502] Failed to authenticate 
> slave(123)@10.0.49.2:33039: Failed to communicate with authenticatee
> I0516 11:32:53.439667 27543 master.cpp:7475] Authenticating 
> slave(123)@10.0.49.2:33039
> I0516 11:32:53.440021 27550 authenticator.cpp:98] Creating new server SASL 
> connection
> W0516 11:32:53.727674 27543 slave.cpp:1098] Authentication timed out
> W0516 11:32:53.728672 27568 slave.cpp:1043] Failed to authenticate with 
> master [email protected]:33039: Authentication discarded
> I0516 11:32:54.270351 27582 slave.cpp:984] Authenticating with master 
> [email protected]:33039
> I0516 11:32:54.270424 27582 slave.cpp:995] Using default CRAM-MD5 
> authenticatee
> I0516 11:32:54.270653 27558 authenticatee.cpp:121] Creating new client SASL 
> connection
> I0516 11:32:54.316350 27562 master.cpp:7461] Queuing up authentication 
> request from slave(123)@10.0.49.2:33039 because authentication is still in 
> progress
> ../../mesos/src/tests/hook_tests.cpp:1105: Failure
> Failed to wait 15secs for offers
> *** Aborted at 1494959578 (unix time) try "date -d @1494959578" if you are 
> using GNU date ***
> PC: @          0x1f872d6 testing::UnitTest::AddTestPartResult()
> *** SIGSEGV (@0x0) received by PID 27528 (TID 0x7f40b745f8c0) from PID 0; 
> stack trace: ***
>     @     0x7f40ae4e1370 (unknown)
>     @          0x1f872d6 testing::UnitTest::AddTestPartResult()
>     @          0x1f7be4d testing::internal::AssertHelper::operator=()
>     @          0x12703d5 
> mesos::internal::tests::HookTest_VerifySlaveResourcesAndAttributesDecorator_Test::TestBody()
>     @          0x1fa495c 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
>     @          0x1f9fa44 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
>     @          0x1f8110b testing::Test::Run()
>     @          0x1f8188e testing::TestInfo::Run()
>     @          0x1f81ed4 testing::TestCase::Run()
>     @          0x1f887ae testing::internal::UnitTestImpl::RunAllTests()
> W0516 11:32:58.456420 27547 master.cpp:7502] Failed to authenticate 
> slave(123)@10.0.49.2:33039: Failed to communicate with authenticatee
> I0516 11:32:58.456518 27547 master.cpp:7475] Authenticating 
> slave(123)@10.0.49.2:33039
> I0516 11:32:58.456859 27561 authenticator.cpp:98] Creating new server SASL 
> connection
>     @          0x1fa5581 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> I0516 11:32:58.463368 27545 authenticatee.cpp:213] Received SASL 
> authentication mechanisms: CRAM-MD5
> I0516 11:32:58.463418 27545 authenticatee.cpp:239] Attempting to authenticate 
> with mechanism 'CRAM-MD5'
> I0516 11:32:58.463577 27556 authenticator.cpp:204] Received SASL 
> authentication start
> I0516 11:32:58.463692 27556 authenticator.cpp:326] Authentication requires 
> more steps
> I0516 11:32:58.463830 27546 authenticatee.cpp:259] Received SASL 
> authentication step
> I0516 11:32:58.463982 27574 authenticator.cpp:232] Received SASL 
> authentication step
> I0516 11:32:58.464081 27574 authenticator.cpp:318] Authentication success
> I0516 11:32:58.464236 27568 authenticatee.cpp:299] Authentication success
> I0516 11:32:58.464280 27583 master.cpp:7505] Successfully authenticated 
> principal 'test-principal' at slave(123)@10.0.49.2:33039
> I0516 11:32:58.464535 27587 slave.cpp:1079] Successfully authenticated with 
> master [email protected]:33039
> I0516 11:32:58.465075 27573 master.cpp:5429] Received register agent message 
> from slave(123)@10.0.49.2:33039 (core-dev)
> I0516 11:32:58.465240 27573 master.cpp:3659] Authorizing agent with principal 
> 'test-principal'
> I0516 11:32:58.465688 27547 master.cpp:5564] Registering agent at 
> slave(123)@10.0.49.2:33039 (core-dev) with id 
> e6c479e5-b7e6-439e-a7ad-018faf297fad-S0
> I0516 11:32:58.466064 27546 registrar.cpp:493] Applied 1 operations in 
> 54134ns; attempting to update the registry
> I0516 11:32:58.466888 27548 registrar.cpp:550] Successfully updated the 
> registry in 766976ns
> I0516 11:32:58.467787 27570 slave.cpp:1125] Registered with master 
> [email protected]:33039; given agent ID e6c479e5-b7e6-439e-a7ad-018faf297fad-S0
> I0516 11:32:58.467732 27550 master.cpp:5639] Registered agent 
> e6c479e5-b7e6-439e-a7ad-018faf297fad-S0 at slave(123)@10.0.49.2:33039 
> (core-dev) with mem(*):1024; disk(*):1024; ports(*):[31000-32000]; cpus(*):4; 
> foo(*):{bar, baz}
> I0516 11:32:58.467985 27544 status_update_manager.cpp:184] Resuming sending 
> status updates
> I0516 11:32:58.468003 27560 hierarchical.cpp:525] Added agent 
> e6c479e5-b7e6-439e-a7ad-018faf297fad-S0 (core-dev) with mem(*):1024; 
> disk(*):1024; ports(*):[31000-32000]; cpus(*):4; foo(*):{bar, baz} 
> (allocated: {})
> I0516 11:32:58.468586 27570 slave.cpp:1191] Forwarding total oversubscribed 
> resources {}
> I0516 11:32:58.468780 27567 master.cpp:6324] Received update of agent 
> e6c479e5-b7e6-439e-a7ad-018faf297fad-S0 at slave(123)@10.0.49.2:33039 
> (core-dev) with total oversubscribed resources {}
> I0516 11:32:58.469619 27556 master.cpp:7305] Sending 1 offers to framework 
> e6c479e5-b7e6-439e-a7ad-018faf297fad-0000 (default) at 
> [email protected]:33039
>     @          0x1fa05c2 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
>     @          0x1f874f4 testing::UnitTest::Run()
>     @          0x135aa4b RUN_ALL_TESTS()
>     @          0x135a51c main
>     @     0x7f40ad2efb35 __libc_start_main
>     @           0xb02d99 (unknown)
> zsh: segmentation fault (core dumped)  ./src/mesos-tests  --verbose 
> --gtest_break_on_failure --gtest_repeat=500
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to