Benjamin Bannier created MESOS-8733:
---------------------------------------

             Summary: OversubscriptionTest.ForwardUpdateSlaveMessage is flaky
                 Key: MESOS-8733
                 URL: https://issues.apache.org/jira/browse/MESOS-8733
             Project: Mesos
          Issue Type: Bug
          Components: test
    Affects Versions: 1.6.0
            Reporter: Benjamin Bannier
            Assignee: Benjamin Bannier


Observed this failure in CI,
{noformat}
[ RUN ] OversubscriptionTest.ForwardUpdateSlaveMessage
3: I0327 10:12:04.032042 18320 cluster.cpp:172] Creating default 'local' 
authorizer
3: I0327 10:12:04.035696 18321 master.cpp:463] Master 
b5c97327-11cc-4183-82ed-75e62b71cc58 (1931c74e0c4c) started on 172.17.0.2:35020
3: I0327 10:12:04.035732 18321 master.cpp:466] Flags at startup: --acls="" 
--agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate_agents="true" --authenticate_frameworks="true" 
--authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
--authenticate_http_readwrite="true" --authenticators="crammd5" 
--authorizers="local" --credentials="/tmp/4j65Va/credentials" 
--filter_gpu_resources="true" --framework_sorter="drf" --help="false" 
--hostname_lookup="true" --http_authenticators="basic" 
--http_framework_authenticators="basic" --initialize_driver_logging="true" 
--log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
--max_agent_ping_timeouts="5" --max_completed_frameworks="50" 
--max_completed_tasks_per_framework="1000" 
--max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" 
--recovery_agent_removal_limit="100%" --registry="in_memory" 
--registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
--registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
--registry_store_timeout="100secs" --registry_strict="false" 
--require_agent_domain="false" --root_submissions="true" --user_sorter="drf" 
--version="false" --webui_dir="/usr/local/share/mesos/webui" 
--work_dir="/tmp/4j65Va/master" --zk_session_timeout="10secs"
3: I0327 10:12:04.036129 18321 master.cpp:515] Master only allowing 
authenticated frameworks to register
3: I0327 10:12:04.036140 18321 master.cpp:521] Master only allowing 
authenticated agents to register
3: I0327 10:12:04.036147 18321 master.cpp:527] Master only allowing 
authenticated HTTP frameworks to register
3: I0327 10:12:04.036156 18321 credentials.hpp:37] Loading credentials for 
authentication from '/tmp/4j65Va/credentials'
3: I0327 10:12:04.036468 18321 master.cpp:571] Using default 'crammd5' 
authenticator
3: I0327 10:12:04.036643 18321 http.cpp:959] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-readonly'
3: I0327 10:12:04.036834 18321 http.cpp:959] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-readwrite'
3: I0327 10:12:04.037005 18321 http.cpp:959] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-scheduler'
3: I0327 10:12:04.037170 18321 master.cpp:652] Authorization enabled
3: I0327 10:12:04.037370 18338 whitelist_watcher.cpp:77] No whitelist given
3: I0327 10:12:04.037374 18322 hierarchical.cpp:175] Initialized hierarchical 
allocator process
3: I0327 10:12:04.040787 18321 master.cpp:2126] Elected as the leading master!
3: I0327 10:12:04.040812 18321 master.cpp:1682] Recovering from registrar
3: I0327 10:12:04.040966 18342 registrar.cpp:347] Recovering registrar
3: I0327 10:12:04.041606 18330 registrar.cpp:391] Successfully fetched the 
registry (0B) in 590848ns
3: I0327 10:12:04.041764 18330 registrar.cpp:495] Applied 1 operations in 
57052ns; attempting to update the registry
3: I0327 10:12:04.042466 18330 registrar.cpp:552] Successfully updated the 
registry in 638976ns
3: I0327 10:12:04.042615 18330 registrar.cpp:424] Successfully recovered 
registrar
3: I0327 10:12:04.043128 18339 master.cpp:1796] Recovered 0 agents from the 
registry (135B); allowing 10mins for agents to reregister
3: I0327 10:12:04.043151 18326 hierarchical.cpp:213] Skipping recovery of 
hierarchical allocator: nothing to recover
3: W0327 10:12:04.048898 18320 process.cpp:2805] Attempted to spawn already 
running process [email protected]:35020
3: I0327 10:12:04.050076 18320 containerizer.cpp:304] Using isolation { 
environment_secret, posix/cpu, posix/mem, filesystem/posix, network/cni }
3: W0327 10:12:04.050720 18320 backend.cpp:76] Failed to create 'aufs' backend: 
AufsBackend requires root privileges
3: W0327 10:12:04.050746 18320 backend.cpp:76] Failed to create 'bind' backend: 
BindBackend requires root privileges
3: I0327 10:12:04.050791 18320 provisioner.cpp:299] Using default backend 'copy'
3: I0327 10:12:04.053491 18320 cluster.cpp:460] Creating default 'local' 
authorizer
3: I0327 10:12:04.056531 18326 slave.cpp:261] Mesos agent started on 
(546)@172.17.0.2:35020
3: I0327 10:12:04.056571 18326 slave.cpp:262] Flags at startup: --acls="" 
--appc_simple_discovery_uri_prefix="http://"; 
--appc_store_dir="/tmp/OversubscriptionTest_ForwardUpdateSlaveMessage_YeoNx5/store/appc"
 --authenticate_http_executors="true" --authenticate_http_readonly="true" 
--authenticate_http_readwrite="true" --authenticatee="crammd5" 
--authentication_backoff_factor="1secs" --authorizer="local" 
--cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" 
--cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" 
--cgroups_root="mesos" --container_disk_watch_interval="15secs" 
--containerizers="mesos" 
--credential="/tmp/OversubscriptionTest_ForwardUpdateSlaveMessage_YeoNx5/credential"
 --default_role="*" --disallow_sharing_agent_pid_namespace="false" 
--disk_watch_interval="1mins" --docker="docker" --docker_kill_orphans="true" 
--docker_registry="https://registry-1.docker.io"; --docker_remove_delay="6hrs" 
--docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" 
--docker_store_dir="/tmp/OversubscriptionTest_ForwardUpdateSlaveMessage_YeoNx5/store/docker"
 --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" 
--enforce_container_disk_quota="false" --executor_registration_timeout="1mins" 
--executor_reregistration_timeout="2secs" 
--executor_shutdown_grace_period="5secs" 
--fetcher_cache_dir="/tmp/OversubscriptionTest_ForwardUpdateSlaveMessage_YeoNx5/fetch"
 --fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" 
--gc_disk_headroom="0.1" --hadoop_home="" --help="false" 
--hostname_lookup="true" --http_command_executor="false" 
--http_credentials="/tmp/OversubscriptionTest_ForwardUpdateSlaveMessage_YeoNx5/http_credentials"
 --http_heartbeat_interval="30secs" --initialize_driver_logging="true" 
--isolation="posix/cpu,posix/mem" 
--jwt_secret_key="/tmp/OversubscriptionTest_ForwardUpdateSlaveMessage_YeoNx5/jwt_secret_key"
 --launcher="posix" --launcher_dir="/tmp/SRC/build/src" --logbufsecs="0" 
--logging_level="INFO" --max_completed_executors_per_framework="150" 
--oversubscribed_resources_interval="15secs" --perf_duration="10secs" 
--perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns" 
--quiet="false" --reconfiguration_policy="equal" --recover="reconnect" 
--recovery_timeout="15mins" --registration_backoff_factor="10ms" 
--resources="cpus:2;gpus:0;mem:1024;disk:1024;ports:[31000-32000]" 
--revocable_cpu_low_priority="true" 
--runtime_dir="/tmp/OversubscriptionTest_ForwardUpdateSlaveMessage_YeoNx5" 
--sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true" 
--systemd_enable_support="true" 
--systemd_runtime_directory="/run/systemd/system" --version="false" 
--work_dir="/tmp/OversubscriptionTest_ForwardUpdateSlaveMessage_8qkWeD" 
--zk_session_timeout="10secs"
3: I0327 10:12:04.057035 18326 credentials.hpp:86] Loading credential for 
authentication from 
'/tmp/OversubscriptionTest_ForwardUpdateSlaveMessage_YeoNx5/credential'
3: I0327 10:12:04.057212 18326 slave.cpp:294] Agent using credential for: 
test-principal
3: I0327 10:12:04.057235 18326 credentials.hpp:37] Loading credentials for 
authentication from 
'/tmp/OversubscriptionTest_ForwardUpdateSlaveMessage_YeoNx5/http_credentials'
3: I0327 10:12:04.057521 18326 http.cpp:959] Creating default 'basic' HTTP 
authenticator for realm 'mesos-agent-executor'
3: I0327 10:12:04.057674 18326 http.cpp:980] Creating default 'jwt' HTTP 
authenticator for realm 'mesos-agent-executor'
3: I0327 10:12:04.057922 18326 http.cpp:959] Creating default 'basic' HTTP 
authenticator for realm 'mesos-agent-readonly'
3: I0327 10:12:04.058051 18326 http.cpp:980] Creating default 'jwt' HTTP 
authenticator for realm 'mesos-agent-readonly'
3: I0327 10:12:04.058272 18326 http.cpp:959] Creating default 'basic' HTTP 
authenticator for realm 'mesos-agent-readwrite'
3: I0327 10:12:04.058408 18326 http.cpp:980] Creating default 'jwt' HTTP 
authenticator for realm 'mesos-agent-readwrite'
3: I0327 10:12:04.058784 18326 disk_profile_adaptor.cpp:80] Creating default 
disk profile adaptor module
3: I0327 10:12:04.060353 18326 slave.cpp:609] Agent resources: 
[{"name":"cpus","scalar":{"value":2.0},"type":"SCALAR"},{"name":"mem","scalar":{"value":1024.0},"type":"SCALAR"},{"name":"disk","scalar":{"value":1024.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"type":"RANGES"}]
3: I0327 10:12:04.060569 18326 slave.cpp:617] Agent attributes: [ ]
3: I0327 10:12:04.060583 18326 slave.cpp:626] Agent hostname: 1931c74e0c4c
3: I0327 10:12:04.060739 18330 task_status_update_manager.cpp:181] Pausing 
sending task status updates
3: I0327 10:12:04.062536 18331 state.cpp:66] Recovering state from 
'/tmp/OversubscriptionTest_ForwardUpdateSlaveMessage_8qkWeD/meta'
3: I0327 10:12:04.062916 18322 task_status_update_manager.cpp:207] Recovering 
task status update manager
3: I0327 10:12:04.063143 18323 containerizer.cpp:674] Recovering containerizer
3: I0327 10:12:04.064961 18330 provisioner.cpp:495] Provisioner recovery 
complete
3: I0327 10:12:04.065325 18336 slave.cpp:7212] Finished recovery
3: I0327 10:12:04.066190 18331 task_status_update_manager.cpp:181] Pausing 
sending task status updates
3: I0327 10:12:04.066213 18336 slave.cpp:1260] New master detected at 
[email protected]:35020
3: I0327 10:12:04.066336 18336 slave.cpp:1315] Detecting new master
3: I0327 10:12:04.067641 18338 slave.cpp:1342] Authenticating with master 
[email protected]:35020
3: I0327 10:12:04.067776 18338 slave.cpp:1351] Using default CRAM-MD5 
authenticatee
3: I0327 10:12:04.068178 18322 authenticatee.cpp:121] Creating new client SASL 
connection
3: I0327 10:12:04.068650 18324 master.cpp:9206] Authenticating 
slave(546)@172.17.0.2:35020
3: I0327 10:12:04.068862 18321 authenticator.cpp:414] Starting authentication 
session for crammd5-authenticatee(1085)@172.17.0.2:35020
3: I0327 10:12:04.069332 18327 authenticator.cpp:98] Creating new server SASL 
connection
3: I0327 10:12:04.069733 18335 authenticatee.cpp:213] Received SASL 
authentication mechanisms: CRAM-MD5
3: I0327 10:12:04.069778 18335 authenticatee.cpp:239] Attempting to 
authenticate with mechanism 'CRAM-MD5'
3: I0327 10:12:04.070008 18332 authenticator.cpp:204] Received SASL 
authentication start
3: I0327 10:12:04.070113 18332 authenticator.cpp:326] Authentication requires 
more steps
3: I0327 10:12:04.070336 18323 authenticatee.cpp:259] Received SASL 
authentication step
3: I0327 10:12:04.070583 18342 authenticator.cpp:232] Received SASL 
authentication step
3: I0327 10:12:04.070636 18342 auxprop.cpp:109] Request to lookup properties 
for user: 'test-principal' realm: '1931c74e0c4c' server FQDN: '1931c74e0c4c' 
SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false 
SASL_AUXPROP_AUTHZID: false
3: I0327 10:12:04.070659 18342 auxprop.cpp:181] Looking up auxiliary property 
'*userPassword'
3: I0327 10:12:04.070724 18342 auxprop.cpp:181] Looking up auxiliary property 
'*cmusaslsecretCRAM-MD5'
3: I0327 10:12:04.070760 18342 auxprop.cpp:109] Request to lookup properties 
for user: 'test-principal' realm: '1931c74e0c4c' server FQDN: '1931c74e0c4c' 
SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false 
SASL_AUXPROP_AUTHZID: true
3: I0327 10:12:04.070824 18342 auxprop.cpp:131] Skipping auxiliary property 
'*userPassword' since SASL_AUXPROP_AUTHZID == true
3: I0327 10:12:04.070832 18342 auxprop.cpp:131] Skipping auxiliary property 
'*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
3: I0327 10:12:04.070847 18342 authenticator.cpp:318] Authentication success
3: I0327 10:12:04.070940 18334 authenticatee.cpp:299] Authentication success
3: I0327 10:12:04.071063 18333 master.cpp:9236] Successfully authenticated 
principal 'test-principal' at slave(546)@172.17.0.2:35020
3: I0327 10:12:04.071118 18337 authenticator.cpp:432] Authentication session 
cleanup for crammd5-authenticatee(1085)@172.17.0.2:35020
3: I0327 10:12:04.071286 18328 slave.cpp:1434] Successfully authenticated with 
master [email protected]:35020
3: I0327 10:12:04.071718 18328 slave.cpp:1877] Will retry registration in 
383294ns if necessary
3: I0327 10:12:04.071923 18330 master.cpp:6326] Received register agent message 
from slave(546)@172.17.0.2:35020 (1931c74e0c4c)
3: I0327 10:12:04.072154 18330 master.cpp:3802] Authorizing agent providing 
resources 'cpus:2; mem:1024; disk:1024; ports:[31000-32000]' with principal 
'test-principal'
3: I0327 10:12:04.072834 18331 master.cpp:6397] Authorized registration of 
agent at slave(546)@172.17.0.2:35020 (1931c74e0c4c)
3: I0327 10:12:04.072928 18331 master.cpp:6509] Registering agent at 
slave(546)@172.17.0.2:35020 (1931c74e0c4c) with id 
b5c97327-11cc-4183-82ed-75e62b71cc58-S0
3: I0327 10:12:04.073508 18329 registrar.cpp:495] Applied 1 operations in 
237308ns; attempting to update the registry
3: I0327 10:12:04.074270 18321 registrar.cpp:552] Successfully updated the 
registry in 675072ns
3: I0327 10:12:04.074518 18335 master.cpp:6557] Admitted agent 
b5c97327-11cc-4183-82ed-75e62b71cc58-S0 at slave(546)@172.17.0.2:35020 
(1931c74e0c4c)
3: I0327 10:12:04.075176 18335 master.cpp:6602] Registered agent 
b5c97327-11cc-4183-82ed-75e62b71cc58-S0 at slave(546)@172.17.0.2:35020 
(1931c74e0c4c) with cpus:2; mem:1024; disk:1024; ports:[31000-32000]
3: I0327 10:12:04.075368 18323 slave.cpp:1877] Will retry registration in 
26.831215ms if necessary
3: I0327 10:12:04.075518 18342 master.cpp:6326] Received register agent message 
from slave(546)@172.17.0.2:35020 (1931c74e0c4c)
3: I0327 10:12:04.075597 18323 slave.cpp:1481] Registered with master 
[email protected]:35020; given agent ID b5c97327-11cc-4183-82ed-75e62b71cc58-S0
3: I0327 10:12:04.075626 18334 hierarchical.cpp:574] Added agent 
b5c97327-11cc-4183-82ed-75e62b71cc58-S0 (1931c74e0c4c) with cpus:2; mem:1024; 
disk:1024; ports:[31000-32000] (allocated: {})
3: I0327 10:12:04.075739 18341 task_status_update_manager.cpp:188] Resuming 
sending task status updates
3: I0327 10:12:04.075709 18342 master.cpp:3802] Authorizing agent providing 
resources 'cpus:2; mem:1024; disk:1024; ports:[31000-32000]' with principal 
'test-principal'
3: I0327 10:12:04.075896 18323 slave.cpp:1501] Checkpointing SlaveInfo to 
'/tmp/OversubscriptionTest_ForwardUpdateSlaveMessage_8qkWeD/meta/slaves/b5c97327-11cc-4183-82ed-75e62b71cc58-S0/slave.info'
3: I0327 10:12:04.075943 18334 hierarchical.cpp:1517] Performed allocation for 
1 agents in 169342ns
3: I0327 10:12:04.076222 18339 master.cpp:6397] Authorized registration of 
agent at slave(546)@172.17.0.2:35020 (1931c74e0c4c)
3: I0327 10:12:04.076292 18339 master.cpp:6488] Agent 
b5c97327-11cc-4183-82ed-75e62b71cc58-S0 at slave(546)@172.17.0.2:35020 
(1931c74e0c4c) already registered, resending acknowledgement
3: I0327 10:12:04.076493 18323 slave.cpp:1548] Forwarding agent update 
{"operations":{},"resource_version_uuid":{"value":"rd+fCEbpQsWYa07c\/1tXpw=="},"slave_id":{"value":"b5c97327-11cc-4183-82ed-75e62b71cc58-S0"},"update_oversubscribed_resources":false}
3: W0327 10:12:04.076702 18323 slave.cpp:1530] Already registered with master 
[email protected]:35020
3: I0327 10:12:04.076735 18323 slave.cpp:1548] Forwarding agent update 
{"operations":{},"resource_version_uuid":{"value":"rd+fCEbpQsWYa07c\/1tXpw=="},"slave_id":{"value":"b5c97327-11cc-4183-82ed-75e62b71cc58-S0"},"update_oversubscribed_resources":false}
3: I0327 10:12:04.077424 18343 master.cpp:7639] Ignoring update on agent 
b5c97327-11cc-4183-82ed-75e62b71cc58-S0 at slave(546)@172.17.0.2:35020 
(1931c74e0c4c) as it reports no changes
3: I0327 10:12:04.078074 18343 master.cpp:7639] Ignoring update on agent 
b5c97327-11cc-4183-82ed-75e62b71cc58-S0 at slave(546)@172.17.0.2:35020 
(1931c74e0c4c) as it reports no changes
3: I0327 10:12:04.080782 18341 hierarchical.cpp:1517] Performed allocation for 
1 agents in 140840ns
3: /tmp/SRC/src/tests/oversubscription_tests.cpp:319: Failure
3: Value of: update.isReady()
3: Actual: true
3: Expected: false
3: I0327 10:12:04.082888 18321 slave.cpp:919] Agent terminating
3: I0327 10:12:04.083225 18335 master.cpp:1295] Agent 
b5c97327-11cc-4183-82ed-75e62b71cc58-S0 at slave(546)@172.17.0.2:35020 
(1931c74e0c4c) disconnected
3: I0327 10:12:04.083271 18335 master.cpp:3283] Disconnecting agent 
b5c97327-11cc-4183-82ed-75e62b71cc58-S0 at slave(546)@172.17.0.2:35020 
(1931c74e0c4c)
3: I0327 10:12:04.083369 18335 master.cpp:3302] Deactivating agent 
b5c97327-11cc-4183-82ed-75e62b71cc58-S0 at slave(546)@172.17.0.2:35020 
(1931c74e0c4c)
3: I0327 10:12:04.083616 18341 hierarchical.cpp:766] Agent 
b5c97327-11cc-4183-82ed-75e62b71cc58-S0 deactivated
3: I0327 10:12:04.092846 18320 master.cpp:1137] Master terminating
3: I0327 10:12:04.093572 18323 hierarchical.cpp:609] Removed agent 
b5c97327-11cc-4183-82ed-75e62b71cc58-S0
3: [ FAILED ] OversubscriptionTest.ForwardUpdateSlaveMessage (68 ms){noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to