Vinod Kone created MESOS-7726:
---------------------------------
Summary: MasterTest.IgnoreOldAgentReregistration test is flaky
Key: MESOS-7726
URL: https://issues.apache.org/jira/browse/MESOS-7726
Project: Mesos
Issue Type: Bug
Reporter: Vinod Kone
Assignee: Neil Conway
Observed this on ASF CI.
{code}
[ RUN ] MasterTest.IgnoreOldAgentReregistration
I0627 05:23:06.031154 4917 cluster.cpp:162] Creating default 'local' authorizer
I0627 05:23:06.033433 4945 master.cpp:438] Master
a8778782-0da1-49a5-9cb8-9f6d11701733 (c43debbe7e32) started on 172.17.0.4:41747
I0627 05:23:06.033457 4945 master.cpp:440] Flags at startup: --acls=""
--agent_ping_timeout="15secs" --agent_reregister_timeout="10mins"
--allocation_interval="1secs" --allocator="HierarchicalDRF"
--authenticate_agents="true" --authenticate_frameworks="true"
--authenticate_http_frameworks="true" --authenticate_http_readonly="true"
--authenticate_http_readwrite="true" --authenticators="crammd5"
--authorizers="local" --credentials="/tmp/2BARnF/credentials"
--filter_gpu_resources="true" --framework_sorter="drf" --help="false"
--hostname_lookup="true" --http_authenticators="basic"
--http_framework_authenticators="basic" --initialize_driver_logging="true"
--log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO"
--max_agent_ping_timeouts="5" --max_completed_frameworks="50"
--max_completed_tasks_per_framework="1000"
--max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false"
--recovery_agent_removal_limit="100%" --registry="in_memory"
--registry_fetch_timeout="1mins" --registry_gc_interval="15mins"
--registry_max_agent_age="2weeks" --registry_max_agent_count="102400"
--registry_store_timeout="100secs" --registry_strict="false"
--root_submissions="true" --user_sorter="drf" --version="false"
--webui_dir="/mesos/mesos-1.4.0/_inst/share/mesos/webui"
--work_dir="/tmp/2BARnF/master" --zk_session_timeout="10secs"
I0627 05:23:06.033771 4945 master.cpp:490] Master only allowing authenticated
frameworks to register
I0627 05:23:06.033787 4945 master.cpp:504] Master only allowing authenticated
agents to register
I0627 05:23:06.033798 4945 master.cpp:517] Master only allowing authenticated
HTTP frameworks to register
I0627 05:23:06.033812 4945 credentials.hpp:37] Loading credentials for
authentication from '/tmp/2BARnF/credentials'
I0627 05:23:06.034080 4945 master.cpp:562] Using default 'crammd5'
authenticator
I0627 05:23:06.034221 4945 http.cpp:974] Creating default 'basic' HTTP
authenticator for realm 'mesos-master-readonly'
I0627 05:23:06.034409 4945 http.cpp:974] Creating default 'basic' HTTP
authenticator for realm 'mesos-master-readwrite'
I0627 05:23:06.034569 4945 http.cpp:974] Creating default 'basic' HTTP
authenticator for realm 'mesos-master-scheduler'
I0627 05:23:06.034688 4945 master.cpp:642] Authorization enabled
I0627 05:23:06.034862 4938 whitelist_watcher.cpp:77] No whitelist given
I0627 05:23:06.034868 4950 hierarchical.cpp:169] Initialized hierarchical
allocator process
I0627 05:23:06.037211 4957 master.cpp:2161] Elected as the leading master!
I0627 05:23:06.037236 4957 master.cpp:1700] Recovering from registrar
I0627 05:23:06.037333 4938 registrar.cpp:345] Recovering registrar
I0627 05:23:06.038146 4938 registrar.cpp:389] Successfully fetched the
registry (0B) in 768256ns
I0627 05:23:06.038290 4938 registrar.cpp:493] Applied 1 operations in 30798ns;
attempting to update the registry
I0627 05:23:06.038861 4938 registrar.cpp:550] Successfully updated the
registry in 510976ns
I0627 05:23:06.038960 4938 registrar.cpp:422] Successfully recovered registrar
I0627 05:23:06.039364 4941 hierarchical.cpp:207] Skipping recovery of
hierarchical allocator: nothing to recover
I0627 05:23:06.039594 4958 master.cpp:1799] Recovered 0 agents from the
registry (129B); allowing 10mins for agents to re-register
I0627 05:23:06.043999 4917 containerizer.cpp:230] Using isolation:
posix/cpu,posix/mem,filesystem/posix,network/cni,environment_secret
W0627 05:23:06.044456 4917 backend.cpp:76] Failed to create 'aufs' backend:
AufsBackend requires root privileges
W0627 05:23:06.044548 4917 backend.cpp:76] Failed to create 'bind' backend:
BindBackend requires root privileges
I0627 05:23:06.044580 4917 provisioner.cpp:255] Using default backend 'copy'
I0627 05:23:06.046222 4917 cluster.cpp:448] Creating default 'local' authorizer
I0627 05:23:06.047572 4950 slave.cpp:249] Mesos agent started on
(269)@172.17.0.4:41747
I0627 05:23:06.047591 4950 slave.cpp:250] Flags at startup: --acls=""
--appc_simple_discovery_uri_prefix="http://"
--appc_store_dir="/tmp/MasterTest_IgnoreOldAgentReregistration_Bgz7OK/store/appc"
--authenticate_http_readonly="true" --authenticate_http_readwrite="true"
--authenticatee="crammd5" --authentication_backoff_factor="1secs"
--authorizer="local" --cgroups_cpu_enable_pids_and_tids_count="false"
--cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup"
--cgroups_limit_swap="false" --cgroups_root="mesos"
--container_disk_watch_interval="15secs" --containerizers="mesos"
--credential="/tmp/MasterTest_IgnoreOldAgentReregistration_Bgz7OK/credential"
--default_role="*" --disk_watch_interval="1mins" --docker="docker"
--docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io"
--docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock"
--docker_stop_timeout="0ns"
--docker_store_dir="/tmp/MasterTest_IgnoreOldAgentReregistration_Bgz7OK/store/docker"
--docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume"
--enforce_container_disk_quota="false" --executor_registration_timeout="1mins"
--executor_reregistration_timeout="2secs"
--executor_shutdown_grace_period="5secs"
--fetcher_cache_dir="/tmp/MasterTest_IgnoreOldAgentReregistration_Bgz7OK/fetch"
--fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks"
--gc_disk_headroom="0.1" --hadoop_home="" --help="false"
--hostname_lookup="true" --http_command_executor="false"
--http_credentials="/tmp/MasterTest_IgnoreOldAgentReregistration_Bgz7OK/http_credentials"
--http_heartbeat_interval="30secs" --initialize_driver_logging="true"
--isolation="posix/cpu,posix/mem" --launcher="posix"
--launcher_dir="/mesos/mesos-1.4.0/_build/src" --logbufsecs="0"
--logging_level="INFO" --max_completed_executors_per_framework="150"
--oversubscribed_resources_interval="15secs" --perf_duration="10secs"
--perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns"
--quiet="false" --recover="reconnect" --recovery_timeout="15mins"
--registration_backoff_factor="10ms"
--resources="cpus:2;gpus:0;mem:1024;disk:1024;ports:[31000-32000]"
--revocable_cpu_low_priority="true"
--runtime_dir="/tmp/MasterTest_IgnoreOldAgentReregistration_Bgz7OK"
--sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true"
--systemd_enable_support="true"
--systemd_runtime_directory="/run/systemd/system" --version="false"
--work_dir="/tmp/MasterTest_IgnoreOldAgentReregistration_US8t6c"
I0627 05:23:06.047937 4950 credentials.hpp:86] Loading credential for
authentication from
'/tmp/MasterTest_IgnoreOldAgentReregistration_Bgz7OK/credential'
I0627 05:23:06.048065 4950 slave.cpp:282] Agent using credential for:
test-principal
I0627 05:23:06.048085 4950 credentials.hpp:37] Loading credentials for
authentication from
'/tmp/MasterTest_IgnoreOldAgentReregistration_Bgz7OK/http_credentials'
I0627 05:23:06.048274 4950 http.cpp:974] Creating default 'basic' HTTP
authenticator for realm 'mesos-agent-readonly'
I0627 05:23:06.048396 4950 http.cpp:974] Creating default 'basic' HTTP
authenticator for realm 'mesos-agent-readwrite'
I0627 05:23:06.049597 4950 slave.cpp:553] Agent resources:
[{"name":"cpus","scalar":{"value":2.0},"type":"SCALAR"},{"name":"mem","scalar":{"value":1024.0},"type":"SCALAR"},{"name":"disk","scalar":{"value":1024.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"type":"RANGES"}]
I0627 05:23:06.049815 4950 slave.cpp:561] Agent attributes: [ ]
I0627 05:23:06.049830 4950 slave.cpp:566] Agent hostname: c43debbe7e32
I0627 05:23:06.049934 4941 status_update_manager.cpp:177] Pausing sending
status updates
I0627 05:23:06.051394 4957 state.cpp:64] Recovering state from
'/tmp/MasterTest_IgnoreOldAgentReregistration_US8t6c/meta'
I0627 05:23:06.051775 4954 status_update_manager.cpp:203] Recovering status
update manager
I0627 05:23:06.052050 4959 containerizer.cpp:582] Recovering containerizer
I0627 05:23:06.053479 4946 provisioner.cpp:416] Provisioner recovery complete
I0627 05:23:06.053802 4937 slave.cpp:6168] Finished recovery
I0627 05:23:06.054203 4937 slave.cpp:6350] Querying resource estimator for
oversubscribable resources
I0627 05:23:06.054503 4954 slave.cpp:946] New master detected at
[email protected]:41747
I0627 05:23:06.054517 4950 status_update_manager.cpp:177] Pausing sending
status updates
I0627 05:23:06.054641 4954 slave.cpp:981] Detecting new master
I0627 05:23:06.054774 4954 slave.cpp:6364] Received oversubscribable resources
{} from the resource estimator
I0627 05:23:06.065029 4943 slave.cpp:1008] Authenticating with master
[email protected]:41747
I0627 05:23:06.065158 4943 slave.cpp:1019] Using default CRAM-MD5 authenticatee
I0627 05:23:06.065412 4945 authenticatee.cpp:121] Creating new client SASL
connection
I0627 05:23:06.065798 4938 master.cpp:7640] Authenticating
slave(269)@172.17.0.4:41747
I0627 05:23:06.065937 4954 authenticator.cpp:414] Starting authentication
session for crammd5-authenticatee(577)@172.17.0.4:41747
I0627 05:23:06.066174 4937 authenticator.cpp:98] Creating new server SASL
connection
I0627 05:23:06.066419 4958 authenticatee.cpp:213] Received SASL authentication
mechanisms: CRAM-MD5
I0627 05:23:06.066448 4958 authenticatee.cpp:239] Attempting to authenticate
with mechanism 'CRAM-MD5'
I0627 05:23:06.066550 4958 authenticator.cpp:204] Received SASL authentication
start
I0627 05:23:06.066612 4958 authenticator.cpp:326] Authentication requires more
steps
I0627 05:23:06.066845 4952 authenticatee.cpp:259] Received SASL authentication
step
I0627 05:23:06.067041 4936 authenticator.cpp:232] Received SASL authentication
step
I0627 05:23:06.067081 4936 auxprop.cpp:109] Request to lookup properties for
user: 'test-principal' realm: 'c43debbe7e32' server FQDN: 'c43debbe7e32'
SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false
SASL_AUXPROP_AUTHZID: false
I0627 05:23:06.067096 4936 auxprop.cpp:181] Looking up auxiliary property
'*userPassword'
I0627 05:23:06.067129 4936 auxprop.cpp:181] Looking up auxiliary property
'*cmusaslsecretCRAM-MD5'
I0627 05:23:06.067149 4936 auxprop.cpp:109] Request to lookup properties for
user: 'test-principal' realm: 'c43debbe7e32' server FQDN: 'c43debbe7e32'
SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false
SASL_AUXPROP_AUTHZID: true
I0627 05:23:06.067159 4936 auxprop.cpp:131] Skipping auxiliary property
'*userPassword' since SASL_AUXPROP_AUTHZID == true
I0627 05:23:06.067167 4936 auxprop.cpp:131] Skipping auxiliary property
'*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
I0627 05:23:06.067184 4936 authenticator.cpp:318] Authentication success
I0627 05:23:06.067267 4945 authenticatee.cpp:299] Authentication success
I0627 05:23:06.067364 4954 master.cpp:7670] Successfully authenticated
principal 'test-principal' at slave(269)@172.17.0.4:41747
I0627 05:23:06.067433 4943 authenticator.cpp:432] Authentication session
cleanup for crammd5-authenticatee(577)@172.17.0.4:41747
I0627 05:23:06.067656 4939 slave.cpp:1103] Successfully authenticated with
master [email protected]:41747
I0627 05:23:06.067855 4939 slave.cpp:1545] Will retry registration in
1.835348ms if necessary
I0627 05:23:06.068022 4949 master.cpp:5602] Received register agent message
from slave(269)@172.17.0.4:41747 (c43debbe7e32)
I0627 05:23:06.068155 4949 master.cpp:3742] Authorizing agent with principal
'test-principal'
I0627 05:23:06.068568 4957 master.cpp:5662] Authorized registration of agent
at slave(269)@172.17.0.4:41747 (c43debbe7e32)
I0627 05:23:06.068668 4957 master.cpp:5741] Registering agent at
slave(269)@172.17.0.4:41747 (c43debbe7e32) with id
a8778782-0da1-49a5-9cb8-9f6d11701733-S0
I0627 05:23:06.069067 4937 registrar.cpp:493] Applied 1 operations in 56568ns;
attempting to update the registry
I0627 05:23:06.069788 4937 registrar.cpp:550] Successfully updated the
registry in 659968ns
I0627 05:23:06.069957 4946 master.cpp:5788] Admitted agent
a8778782-0da1-49a5-9cb8-9f6d11701733-S0 at slave(269)@172.17.0.4:41747
(c43debbe7e32)
I0627 05:23:06.070394 4950 slave.cpp:1545] Will retry registration in
34.487053ms if necessary
I0627 05:23:06.070508 4950 slave.cpp:4866] Received ping from
slave-observer(272)@172.17.0.4:41747
I0627 05:23:06.070776 4950 slave.cpp:1149] Registered with master
[email protected]:41747; given agent ID a8778782-0da1-49a5-9cb8-9f6d11701733-S0
I0627 05:23:06.070569 4946 master.cpp:5819] Registered agent
a8778782-0da1-49a5-9cb8-9f6d11701733-S0 at slave(269)@172.17.0.4:41747
(c43debbe7e32) with
[{"name":"cpus","scalar":{"value":2.0},"type":"SCALAR"},{"name":"mem","scalar":{"value":1024.0},"type":"SCALAR"},{"name":"disk","scalar":{"value":1024.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"type":"RANGES"}]
I0627 05:23:06.070894 4942 status_update_manager.cpp:184] Resuming sending
status updates
I0627 05:23:06.070890 4949 hierarchical.cpp:587] Added agent
a8778782-0da1-49a5-9cb8-9f6d11701733-S0 (c43debbe7e32) with cpus:2; mem:1024;
disk:1024; ports:[31000-32000] (allocated: {})
I0627 05:23:06.071032 4946 master.cpp:5602] Received register agent message
from slave(269)@172.17.0.4:41747 (c43debbe7e32)
I0627 05:23:06.071089 4950 slave.cpp:1169] Checkpointing SlaveInfo to
'/tmp/MasterTest_IgnoreOldAgentReregistration_US8t6c/meta/slaves/a8778782-0da1-49a5-9cb8-9f6d11701733-S0/slave.info'
I0627 05:23:06.071142 4946 master.cpp:3742] Authorizing agent with principal
'test-principal'
I0627 05:23:06.071179 4949 hierarchical.cpp:1938] No allocations performed
I0627 05:23:06.071238 4949 hierarchical.cpp:1493] Performed allocation for 1
agents in 180904ns
I0627 05:23:06.071403 4950 slave.cpp:1207] Forwarding total oversubscribed
resources {}
I0627 05:23:06.071549 4944 master.cpp:5662] Authorized registration of agent
at slave(269)@172.17.0.4:41747 (c43debbe7e32)
I0627 05:23:06.071624 4952 status_update_manager.cpp:177] Pausing sending
status updates
I0627 05:23:06.071624 4944 master.cpp:5719] Agent
a8778782-0da1-49a5-9cb8-9f6d11701733-S0 at slave(269)@172.17.0.4:41747
(c43debbe7e32) already registered, resending acknowledgement
I0627 05:23:06.071624 4950 slave.cpp:946] New master detected at
[email protected]:41747
I0627 05:23:06.071753 4950 slave.cpp:981] Detecting new master
I0627 05:23:06.071769 4944 master.cpp:6508] Received update of agent
a8778782-0da1-49a5-9cb8-9f6d11701733-S0 at slave(269)@172.17.0.4:41747
(c43debbe7e32) with total oversubscribed resources {}
I0627 05:23:06.071905 4950 slave.cpp:1149] Registered with master
[email protected]:41747; given agent ID a8778782-0da1-49a5-9cb8-9f6d11701733-S0
I0627 05:23:06.072108 4943 status_update_manager.cpp:184] Resuming sending
status updates
I0627 05:23:06.072243 4950 slave.cpp:1169] Checkpointing SlaveInfo to
'/tmp/MasterTest_IgnoreOldAgentReregistration_US8t6c/meta/slaves/a8778782-0da1-49a5-9cb8-9f6d11701733-S0/slave.info'
I0627 05:23:06.072391 4948 hierarchical.cpp:1938] No allocations performed
I0627 05:23:06.072443 4948 hierarchical.cpp:1493] Performed allocation for 1
agents in 158723ns
I0627 05:23:06.072525 4950 slave.cpp:1207] Forwarding total oversubscribed
resources {}
I0627 05:23:06.072618 4950 slave.cpp:1008] Authenticating with master
[email protected]:41747
I0627 05:23:06.072661 4950 slave.cpp:1019] Using default CRAM-MD5 authenticatee
I0627 05:23:06.072693 4959 master.cpp:6508] Received update of agent
a8778782-0da1-49a5-9cb8-9f6d11701733-S0 at slave(269)@172.17.0.4:41747
(c43debbe7e32) with total oversubscribed resources {}
I0627 05:23:06.072882 4946 authenticatee.cpp:121] Creating new client SASL
connection
I0627 05:23:06.073169 4944 master.cpp:7640] Authenticating
slave(269)@172.17.0.4:41747
I0627 05:23:06.073289 4950 authenticator.cpp:414] Starting authentication
session for crammd5-authenticatee(578)@172.17.0.4:41747
I0627 05:23:06.073531 4952 authenticator.cpp:98] Creating new server SASL
connection
I0627 05:23:06.073796 4949 authenticatee.cpp:213] Received SASL authentication
mechanisms: CRAM-MD5
I0627 05:23:06.073827 4949 authenticatee.cpp:239] Attempting to authenticate
with mechanism 'CRAM-MD5'
I0627 05:23:06.073998 4936 authenticator.cpp:204] Received SASL authentication
start
I0627 05:23:06.074061 4936 authenticator.cpp:326] Authentication requires more
steps
I0627 05:23:06.074195 4957 authenticatee.cpp:259] Received SASL authentication
step
I0627 05:23:06.074335 4953 authenticator.cpp:232] Received SASL authentication
step
I0627 05:23:06.074370 4953 auxprop.cpp:109] Request to lookup properties for
user: 'test-principal' realm: 'c43debbe7e32' server FQDN: 'c43debbe7e32'
SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false
SASL_AUXPROP_AUTHZID: false
I0627 05:23:06.074388 4953 auxprop.cpp:181] Looking up auxiliary property
'*userPassword'
I0627 05:23:06.074416 4953 auxprop.cpp:181] Looking up auxiliary property
'*cmusaslsecretCRAM-MD5'
I0627 05:23:06.074436 4953 auxprop.cpp:109] Request to lookup properties for
user: 'test-principal' realm: 'c43debbe7e32' server FQDN: 'c43debbe7e32'
SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false
SASL_AUXPROP_AUTHZID: true
I0627 05:23:06.074446 4953 auxprop.cpp:131] Skipping auxiliary property
'*userPassword' since SASL_AUXPROP_AUTHZID == true
I0627 05:23:06.074455 4953 auxprop.cpp:131] Skipping auxiliary property
'*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
I0627 05:23:06.074468 4953 authenticator.cpp:318] Authentication success
I0627 05:23:06.074553 4956 authenticatee.cpp:299] Authentication success
I0627 05:23:06.074594 4945 master.cpp:7670] Successfully authenticated
principal 'test-principal' at slave(269)@172.17.0.4:41747
I0627 05:23:06.074642 4952 authenticator.cpp:432] Authentication session
cleanup for crammd5-authenticatee(578)@172.17.0.4:41747
I0627 05:23:06.074981 4951 slave.cpp:1103] Successfully authenticated with
master [email protected]:41747
../../src/tests/master_tests.cpp:7110: Failure
Failed to wait 15secs for reregisterSlaveMessage
I0627 05:23:21.082969 4953 slave.cpp:818] Agent terminating
I0627 05:23:21.083305 4936 master.cpp:1316] Agent
a8778782-0da1-49a5-9cb8-9f6d11701733-S0 at slave(269)@172.17.0.4:41747
(c43debbe7e32) disconnected
I0627 05:23:21.083349 4936 master.cpp:3240] Disconnecting agent
a8778782-0da1-49a5-9cb8-9f6d11701733-S0 at slave(269)@172.17.0.4:41747
(c43debbe7e32)
I0627 05:23:21.083443 4936 master.cpp:3259] Deactivating agent
a8778782-0da1-49a5-9cb8-9f6d11701733-S0 at slave(269)@172.17.0.4:41747
(c43debbe7e32)
I0627 05:23:21.083665 4936 hierarchical.cpp:715] Agent
a8778782-0da1-49a5-9cb8-9f6d11701733-S0 deactivated
I0627 05:23:21.089186 4917 master.cpp:1158] Master terminating
I0627 05:23:21.090092 4944 hierarchical.cpp:620] Removed agent
a8778782-0da1-49a5-9cb8-9f6d11701733-S0
../../3rdparty/libprocess/include/process/gmock.hpp:441: Failure
Actual function call count doesn't match EXPECT_CALL(filter->mock,
filter(testing::A<const MessageEvent&>()))...
Expected args: message matcher (8-byte object <E8-AB 06-80 6C-2B 00-00>,
1-byte object <AB>, 1-byte object <E8>)
Expected: to be called once
Actual: never called - unsatisfied and active
[ FAILED ] MasterTest.IgnoreOldAgentReregistration (15070 ms)
{code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)