Neil Conway created MESOS-7517:
----------------------------------
Summary: HealthCheckTest.ConsecutiveFailures is flaky
Key: MESOS-7517
URL: https://issues.apache.org/jira/browse/MESOS-7517
Project: Mesos
Issue Type: Bug
Reporter: Neil Conway
{noformat}
[ RUN ] HealthCheckTest.ConsecutiveFailures
I0516 17:12:44.380421 28941 cluster.cpp:162] Creating default 'local' authorizer
I0516 17:12:44.389566 28996 master.cpp:436] Master
2b745611-28cc-491b-80ea-2b6e94a9cab8 (core-dev) started on 10.0.49.2:37598
I0516 17:12:44.389619 28996 master.cpp:438] Flags at startup: --acls=""
--agent_ping_timeout="15secs" --agent_reregister_timeout="10mins"
--allocation_interval="1secs" --allocator="HierarchicalDRF"
--authenticate_agents="true" --authenticate_frameworks="true"
--authenticate_http_frameworks="true" --authenticate_http_readonly="true"
--authenticate_http_readwrite="true" --authenticators="crammd5"
--authorizers="local" --credentials="/tmp/kYELQI/credentials"
--framework_sorter="drf" --help="false" --hostname_lookup="true"
--http_authenticators="basic" --http_framework_authenticators="basic"
--initialize_driver_logging="true" --log_auto_initialize="true"
--logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5"
--max_completed_frameworks="50" --max_completed_tasks_per_framework="1000"
--max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false"
--recovery_agent_removal_limit="100%" --registry="in_memory"
--registry_fetch_timeout="1mins" --registry_gc_interval="15mins"
--registry_max_agent_age="2weeks" --registry_max_agent_count="102400"
--registry_store_timeout="100secs" --registry_strict="false"
--root_submissions="true" --user_sorter="drf" --version="false"
--webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/kYELQI/master"
--zk_session_timeout="10secs"
I0516 17:12:44.389943 28996 master.cpp:488] Master only allowing authenticated
frameworks to register
I0516 17:12:44.389971 28996 master.cpp:502] Master only allowing authenticated
agents to register
I0516 17:12:44.389988 28996 master.cpp:515] Master only allowing authenticated
HTTP frameworks to register
I0516 17:12:44.390012 28996 credentials.hpp:37] Loading credentials for
authentication from '/tmp/kYELQI/credentials'
I0516 17:12:44.390353 28996 master.cpp:560] Using default 'crammd5'
authenticator
I0516 17:12:44.390504 28996 http.cpp:975] Creating default 'basic' HTTP
authenticator for realm 'mesos-master-readonly'
I0516 17:12:44.390661 28996 http.cpp:975] Creating default 'basic' HTTP
authenticator for realm 'mesos-master-readwrite'
I0516 17:12:44.390993 28996 http.cpp:975] Creating default 'basic' HTTP
authenticator for realm 'mesos-master-scheduler'
I0516 17:12:44.391158 28996 master.cpp:640] Authorization enabled
I0516 17:12:44.393784 28958 master.cpp:2161] Elected as the leading master!
I0516 17:12:44.393831 28958 master.cpp:1700] Recovering from registrar
I0516 17:12:44.394521 28969 registrar.cpp:389] Successfully fetched the
registry (0B) in 536064ns
I0516 17:12:44.394621 28969 registrar.cpp:493] Applied 1 operations in 16653ns;
attempting to update the registry
I0516 17:12:44.395346 28969 registrar.cpp:550] Successfully updated the
registry in 664832ns
I0516 17:12:44.395448 28969 registrar.cpp:422] Successfully recovered registrar
I0516 17:12:44.395992 28958 master.cpp:1799] Recovered 0 agents from the
registry (119B); allowing 10mins for agents to re-register
I0516 17:12:44.404881 28941 containerizer.cpp:221] Using isolation:
posix/cpu,posix/mem,filesystem/posix,network/cni
W0516 17:12:44.405333 28941 backend.cpp:76] Failed to create 'overlay' backend:
OverlayBackend requires root privileges
W0516 17:12:44.405426 28941 backend.cpp:76] Failed to create 'bind' backend:
BindBackend requires root privileges
I0516 17:12:44.405462 28941 provisioner.cpp:249] Using default backend 'copy'
I0516 17:12:44.406657 28941 cluster.cpp:448] Creating default 'local' authorizer
I0516 17:12:44.407929 28989 slave.cpp:225] Mesos agent started on
(203)@10.0.49.2:37598
I0516 17:12:44.407973 28989 slave.cpp:226] Flags at startup: --acls=""
--appc_simple_discovery_uri_prefix="http://"
--appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="true"
--authenticate_http_readwrite="true" --authenticatee="crammd5"
--authentication_backoff_factor="1secs" --authorizer="local"
--cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false"
--cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false"
--cgroups_root="mesos" --container_disk_watch_interval="15secs"
--containerizers="mesos"
--credential="/tmp/HealthCheckTest_ConsecutiveFailures_wNp6VH/credential"
--default_role="*" --disk_watch_interval="1mins" --docker="docker"
--docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io"
--docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock"
--docker_stop_timeout="0ns" --docker_store_dir="/tmp/mesos/store/docker"
--docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume"
--enforce_container_disk_quota="false" --executor_registration_timeout="1mins"
--executor_shutdown_grace_period="5secs"
--fetcher_cache_dir="/tmp/HealthCheckTest_ConsecutiveFailures_wNp6VH/fetch"
--fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks"
--gc_disk_headroom="0.1" --hadoop_home="" --help="false"
--hostname_lookup="true" --http_command_executor="false"
--http_credentials="/tmp/HealthCheckTest_ConsecutiveFailures_wNp6VH/http_credentials"
--http_heartbeat_interval="30secs" --initialize_driver_logging="true"
--isolation="posix/cpu,posix/mem" --launcher="posix"
--launcher_dir="/home/nrc/build-mesos-default-opts/src" --logbufsecs="0"
--logging_level="INFO" --max_completed_executors_per_framework="150"
--oversubscribed_resources_interval="15secs" --perf_duration="10secs"
--perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns"
--quiet="false" --recover="reconnect" --recovery_timeout="15mins"
--registration_backoff_factor="10ms"
--resources="cpus:2;gpus:0;mem:1024;disk:1024;ports:[31000-32000]"
--revocable_cpu_low_priority="true"
--runtime_dir="/tmp/HealthCheckTest_ConsecutiveFailures_wNp6VH"
--sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true"
--systemd_enable_support="true"
--systemd_runtime_directory="/run/systemd/system" --version="false"
--work_dir="/tmp/HealthCheckTest_ConsecutiveFailures_WXsqod"
I0516 17:12:44.408372 28989 credentials.hpp:86] Loading credential for
authentication from '/tmp/HealthCheckTest_ConsecutiveFailures_wNp6VH/credential'
I0516 17:12:44.408543 28989 slave.cpp:258] Agent using credential for:
test-principal
I0516 17:12:44.408593 28989 credentials.hpp:37] Loading credentials for
authentication from
'/tmp/HealthCheckTest_ConsecutiveFailures_wNp6VH/http_credentials'
I0516 17:12:44.408852 28989 http.cpp:975] Creating default 'basic' HTTP
authenticator for realm 'mesos-agent-readonly'
I0516 17:12:44.409008 28989 http.cpp:975] Creating default 'basic' HTTP
authenticator for realm 'mesos-agent-readwrite'
I0516 17:12:44.414839 28989 slave.cpp:529] Agent resources: cpus(*):2;
mem(*):1024; disk(*):1024; ports(*):[31000-32000]
I0516 17:12:44.414953 28989 slave.cpp:537] Agent attributes: [ ]
I0516 17:12:44.414980 28989 slave.cpp:542] Agent hostname: core-dev
I0516 17:12:44.415108 28961 status_update_manager.cpp:177] Pausing sending
status updates
I0516 17:12:44.416466 28961 state.cpp:62] Recovering state from
'/tmp/HealthCheckTest_ConsecutiveFailures_WXsqod/meta'
I0516 17:12:44.416718 28958 status_update_manager.cpp:203] Recovering status
update manager
I0516 17:12:44.417064 28960 containerizer.cpp:608] Recovering containerizer
I0516 17:12:44.419234 28976 provisioner.cpp:410] Provisioner recovery complete
I0516 17:12:44.419749 28986 slave.cpp:5974] Finished recovery
I0516 17:12:44.420372 28998 status_update_manager.cpp:177] Pausing sending
status updates
I0516 17:12:44.420370 28986 slave.cpp:922] New master detected at
[email protected]:37598
I0516 17:12:44.420516 28986 slave.cpp:957] Detecting new master
I0516 17:12:44.424572 28941 sched.cpp:232] Version: 1.4.0
I0516 17:12:44.425042 28995 sched.cpp:336] New master detected at
[email protected]:37598
I0516 17:12:44.425138 28995 sched.cpp:407] Authenticating with master
[email protected]:37598
I0516 17:12:44.425168 28995 sched.cpp:414] Using default CRAM-MD5 authenticatee
I0516 17:12:44.425364 28958 authenticatee.cpp:121] Creating new client SASL
connection
I0516 17:12:44.429754 28999 slave.cpp:984] Authenticating with master
[email protected]:37598
I0516 17:12:44.429811 28999 slave.cpp:995] Using default CRAM-MD5 authenticatee
I0516 17:12:44.429942 28955 authenticatee.cpp:121] Creating new client SASL
connection
I0516 17:12:44.437100 28984 master.cpp:7475] Authenticating
slave(203)@10.0.49.2:37598
I0516 17:12:44.437371 28965 authenticator.cpp:98] Creating new server SASL
connection
W0516 17:12:49.426436 28956 sched.cpp:537] Authentication timed out
W0516 17:12:49.430752 28985 slave.cpp:1098] Authentication timed out
W0516 17:12:49.431509 28973 slave.cpp:1043] Failed to authenticate with master
[email protected]:37598: Authentication discarded
W0516 17:12:49.437960 29000 master.cpp:7522] Authentication timed out
I0516 17:12:49.442778 28996 master.cpp:7475] Authenticating
[email protected]:37598
I0516 17:12:49.443080 28995 authenticator.cpp:98] Creating new server SASL
connection
I0516 17:12:49.443548 28966 sched.cpp:477] Failed to authenticate with master
[email protected]:37598: Authentication discarded
W0516 17:12:49.449880 28964 master.cpp:7502] Failed to authenticate
[email protected]:37598: Failed to
communicate with authenticatee
I0516 17:12:49.888478 29000 slave.cpp:984] Authenticating with master
[email protected]:37598
I0516 17:12:49.888593 29000 slave.cpp:995] Using default CRAM-MD5 authenticatee
I0516 17:12:49.888759 28995 authenticatee.cpp:121] Creating new client SASL
connection
I0516 17:12:49.896517 28995 master.cpp:7461] Queuing up authentication request
from slave(203)@10.0.49.2:37598 because authentication is still in progress
I0516 17:12:51.343961 28977 sched.cpp:407] Authenticating with master
[email protected]:37598
I0516 17:12:51.344002 28977 sched.cpp:414] Using default CRAM-MD5 authenticatee
I0516 17:12:51.344451 29000 authenticatee.cpp:121] Creating new client SASL
connection
I0516 17:12:51.373108 29001 master.cpp:7475] Authenticating
[email protected]:37598
I0516 17:12:51.373463 28975 authenticator.cpp:98] Creating new server SASL
connection
I0516 17:12:51.415412 28957 authenticatee.cpp:213] Received SASL authentication
mechanisms: CRAM-MD5
I0516 17:12:51.415469 28957 authenticatee.cpp:239] Attempting to authenticate
with mechanism 'CRAM-MD5'
I0516 17:12:51.415738 28978 authenticator.cpp:204] Received SASL authentication
start
I0516 17:12:51.415832 28978 authenticator.cpp:326] Authentication requires more
steps
I0516 17:12:51.415956 28969 authenticatee.cpp:259] Received SASL authentication
step
I0516 17:12:51.416134 28996 authenticator.cpp:232] Received SASL authentication
step
I0516 17:12:51.416249 28996 authenticator.cpp:318] Authentication success
I0516 17:12:51.416415 28970 master.cpp:7505] Successfully authenticated
principal 'test-principal' at
[email protected]:37598
I0516 17:12:51.416525 28964 authenticatee.cpp:299] Authentication success
I0516 17:12:51.416913 28980 sched.cpp:513] Successfully authenticated with
master [email protected]:37598
I0516 17:12:51.417172 28987 master.cpp:2813] Received SUBSCRIBE call for
framework 'default' at
[email protected]:37598
I0516 17:12:51.417279 28987 master.cpp:2197] Authorizing framework principal
'test-principal' to receive offers for roles '{ * }'
I0516 17:12:51.417778 29001 master.cpp:2890] Subscribing framework default with
checkpointing disabled and capabilities [ ]
I0516 17:12:51.418303 29002 sched.cpp:759] Framework registered with
2b745611-28cc-491b-80ea-2b6e94a9cab8-0000
I0516 17:12:51.418393 28958 hierarchical.cpp:273] Added framework
2b745611-28cc-491b-80ea-2b6e94a9cab8-0000
W0516 17:12:54.888931 28985 slave.cpp:1098] Authentication timed out
W0516 17:12:54.889354 28985 slave.cpp:1043] Failed to authenticate with master
[email protected]:37598: Authentication discarded
I0516 17:12:55.118023 28973 slave.cpp:984] Authenticating with master
[email protected]:37598
I0516 17:12:55.118098 28973 slave.cpp:995] Using default CRAM-MD5 authenticatee
I0516 17:12:55.118614 28967 authenticatee.cpp:121] Creating new client SASL
connection
../../mesos/src/tests/health_check_tests.cpp:957: Failure
Failed to wait 15secs for offers
*** Aborted at 1494979979 (unix time) try "date -d @1494979979" if you are
using GNU date ***
PC: @ 0x2011328 testing::UnitTest::AddTestPartResult()
*** SIGSEGV (@0x0) received by PID 28941 (TID 0x7f3981a4a8c0) from PID 0; stack
trace: ***
@ 0x7f3978acc370 (unknown)
W0516 17:12:59.454641 28978 master.cpp:7502] Failed to authenticate
slave(203)@10.0.49.2:37598: Failed to communicate with authenticatee
I0516 17:12:59.454766 28978 master.cpp:7475] Authenticating
slave(203)@10.0.49.2:37598
W0516 17:12:59.455497 28958 master.cpp:7502] Failed to authenticate
slave(203)@10.0.49.2:37598: Failed to communicate with authenticatee
@ 0x2011328 testing::UnitTest::AddTestPartResult()
@ 0x2004467 testing::internal::AssertHelper::operator=()
@ 0x11ca5d0
mesos::internal::tests::HealthCheckTest_ConsecutiveFailures_Test::TestBody()
@ 0x2030820
testing::internal::HandleSehExceptionsInMethodIfSupported<>()
@ 0x202ae80
testing::internal::HandleExceptionsInMethodIfSupported<>()
@ 0x200b04d testing::Test::Run()
@ 0x200b866 testing::TestInfo::Run()
@ 0x200beac testing::TestCase::Run()
@ 0x2012800 testing::internal::UnitTestImpl::RunAllTests()
@ 0x2031445
testing::internal::HandleSehExceptionsInMethodIfSupported<>()
@ 0x202b9fe
testing::internal::HandleExceptionsInMethodIfSupported<>()
@ 0x2011546 testing::UnitTest::Run()
@ 0x138ca1b RUN_ALL_TESTS()
@ 0x138c4ec main
@ 0x7f39778dab35 __libc_start_main
@ 0xb0a049 (unknown)
zsh: segmentation fault (core dumped) ./src/mesos-tests
--gtest_filter="HealthCheckTest.ConsecutiveFailures"
{noformat}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)