[
https://issues.apache.org/jira/browse/MESOS-7441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908996#comment-16908996
]
Andrei Sekretenko commented on MESOS-7441:
------------------------------------------
One more similar failure in internal CI:
{code}
[ RUN ] RegisterSlaveValidationTest.DropInvalidRegistration
I0815 19:44:30.927489 27828 cluster.cpp:177] Creating default 'local' authorizer
I0815 19:44:30.928858 27832 master.cpp:440] Master
049efd6d-4fdb-48bd-94bb-60633f9e2178 (ip-172-16-10-239.ec2.internal) started on
172.16.10.239:60696
I0815 19:44:30.928879 27832 master.cpp:443] Flags at startup: --acls=""
--agent_ping_timeout="15secs" --agent_reregister_timeout="10mins"
--allocation_interval="1secs" --allocator="hierarchical"
--authenticate_agents="true" --authenticate_frameworks="true"
--authenticate_http_frameworks="true" --authenticate_http_readonly="true"
--authenticate_http_readwrite="true" --authentication_v0_timeout="15secs"
--authenticators="crammd5" --authorizers="local"
--credentials="/tmp/FoShu5/credentials" --filter_gpu_resources="true"
--framework_sorter="drf" --help="false" --hostname_lookup="true"
--http_authenticators="basic" --http_framework_authenticators="basic"
--initialize_driver_logging="true" --log_auto_initialize="true"
--logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5"
--max_completed_frameworks="50" --max_completed_tasks_per_framework="1000"
--max_operator_event_stream_subscribers="1000"
--max_unreachable_tasks_per_framework="1000" --memory_profiling="false"
--min_allocatable_resources="cpus:0.01|mem:32" --port="5050"
--publish_per_framework_metrics="true" --quiet="false"
--recovery_agent_removal_limit="100%" --registry="in_memory"
--registry_fetch_timeout="1mins" --registry_gc_interval="15mins"
--registry_max_agent_age="2weeks" --registry_max_agent_count="102400"
--registry_store_timeout="100secs" --registry_strict="false"
--require_agent_domain="false" --role_sorter="drf" --root_submissions="true"
--version="false" --webui_dir="/usr/local/share/mesos/webui"
--work_dir="/tmp/FoShu5/master" --zk_session_timeout="10secs"
I0815 19:44:30.929015 27832 master.cpp:492] Master only allowing authenticated
frameworks to register
I0815 19:44:30.929023 27832 master.cpp:498] Master only allowing authenticated
agents to register
I0815 19:44:30.929029 27832 master.cpp:504] Master only allowing authenticated
HTTP frameworks to register
I0815 19:44:30.929034 27832 credentials.hpp:37] Loading credentials for
authentication from '/tmp/FoShu5/credentials'
I0815 19:44:30.929141 27832 master.cpp:548] Using default 'crammd5'
authenticator
I0815 19:44:30.929177 27832 http.cpp:975] Creating default 'basic' HTTP
authenticator for realm 'mesos-master-readonly'
I0815 19:44:30.929205 27832 http.cpp:975] Creating default 'basic' HTTP
authenticator for realm 'mesos-master-readwrite'
I0815 19:44:30.929224 27832 http.cpp:975] Creating default 'basic' HTTP
authenticator for realm 'mesos-master-scheduler'
I0815 19:44:30.929240 27832 master.cpp:629] Authorization enabled
I0815 19:44:30.929502 27835 hierarchical.cpp:241] Initialized hierarchical
allocator process
I0815 19:44:30.929632 27835 whitelist_watcher.cpp:77] No whitelist given
I0815 19:44:30.930215 27831 master.cpp:2168] Elected as the leading master!
I0815 19:44:30.930229 27831 master.cpp:1664] Recovering from registrar
I0815 19:44:30.930265 27831 registrar.cpp:339] Recovering registrar
I0815 19:44:30.930387 27831 registrar.cpp:383] Successfully fetched the
registry (0B) in 109056ns
I0815 19:44:30.930416 27831 registrar.cpp:487] Applied 1 operations in 7828ns;
attempting to update the registry
I0815 19:44:30.930532 27831 registrar.cpp:544] Successfully updated the
registry in 101888ns
I0815 19:44:30.930559 27831 registrar.cpp:416] Successfully recovered registrar
I0815 19:44:30.930629 27831 master.cpp:1817] Recovered 0 agents from the
registry (184B); allowing 10mins for agents to reregister
I0815 19:44:30.930668 27831 hierarchical.cpp:280] Skipping recovery of
hierarchical allocator: nothing to recover
W0815 19:44:30.933223 27828 process.cpp:2877] Attempted to spawn already
running process [email protected]:60696
I0815 19:44:30.933728 27828 containerizer.cpp:318] Using isolation {
environment_secret, posix/cpu, posix/mem, filesystem/posix, network/cni }
I0815 19:44:30.935936 27828 linux_launcher.cpp:144] Using
/sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher
I0815 19:44:30.936514 27828 provisioner.cpp:300] Using default backend 'aufs'
I0815 19:44:30.937144 27828 cluster.cpp:518] Creating default 'local' authorizer
I0815 19:44:30.937669 27836 slave.cpp:267] Mesos agent started on
(386)@172.16.10.239:60696
I0815 19:44:30.937685 27836 slave.cpp:268] Flags at startup: --acls=""
--appc_simple_discovery_uri_prefix="http://"
--appc_store_dir="/tmp/FoShu5/Gndbuc/store/appc"
--authenticate_http_executors="true" --authenticate_http_readonly="true"
--authenticate_http_readwrite="true" --authenticatee="crammd5"
--authentication_backoff_factor="1secs" --authentication_timeout_max="1mins"
--authentication_timeout_min="5secs" --authorizer="local"
--cgroups_cpu_enable_pids_and_tids_count="false"
--cgroups_destroy_timeout="1mins" --cgroups_enable_cfs="false"
--cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false"
--cgroups_root="mesos" --container_disk_watch_interval="15secs"
--containerizers="mesos" --credential="/tmp/FoShu5/Gndbuc/credential"
--default_role="*" --disallow_sharing_agent_ipc_namespace="false"
--disallow_sharing_agent_pid_namespace="false" --disk_watch_interval="1mins"
--docker="docker" --docker_ignore_runtime="false" --docker_kill_orphans="true"
--docker_registry="https://registry-1.docker.io" --docker_remove_delay="6hrs"
--docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns"
--docker_store_dir="/tmp/FoShu5/Gndbuc/store/docker"
--docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume"
--docker_volume_chown="false" --enforce_container_disk_quota="false"
--executor_registration_timeout="1mins"
--executor_reregistration_timeout="2secs"
--executor_shutdown_grace_period="5secs"
--fetcher_cache_dir="/tmp/FoShu5/Gndbuc/fetch" --fetcher_cache_size="2GB"
--fetcher_stall_timeout="1mins"
--frameworks_home="/tmp/FoShu5/Gndbuc/frameworks" --gc_delay="1weeks"
--gc_disk_headroom="0.1" --gc_non_executor_container_sandboxes="false"
--help="false" --hostname_lookup="true" --http_command_executor="false"
--http_credentials="/tmp/FoShu5/Gndbuc/http_credentials"
--http_heartbeat_interval="30secs" --initialize_driver_logging="true"
--isolation="posix/cpu,posix/mem"
--jwt_secret_key="/tmp/FoShu5/Gndbuc/jwt_secret_key" --launcher="linux"
--launcher_dir="/home/ubuntu/workspace/mesos/Mesos_CI-build/FLAG/SSL/label/mesos-ec2-ubuntu-14.04/mesos/build/src"
--logbufsecs="0" --logging_level="INFO"
--max_completed_executors_per_framework="150" --memory_profiling="false"
--network_cni_metrics="true" --network_cni_root_dir_persist="false"
--oversubscribed_resources_interval="15secs" --perf_duration="10secs"
--perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns"
--quiet="false" --reconfiguration_policy="equal" --recover="reconnect"
--recovery_timeout="15mins" --registration_backoff_factor="10ms"
--resources="cpus:2;gpus:0;mem:1024;disk:1024;ports:[31000-32000]"
--revocable_cpu_low_priority="true"
--runtime_dir="/tmp/RegisterSlaveValidationTest_DropInvalidRegistration_ntsx99"
--sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true"
--systemd_enable_support="true"
--systemd_runtime_directory="/run/systemd/system" --version="false"
--work_dir="/tmp/RegisterSlaveValidationTest_DropInvalidRegistration_001TO7"
--zk_session_timeout="10secs"
I0815 19:44:30.937845 27836 credentials.hpp:86] Loading credential for
authentication from '/tmp/FoShu5/Gndbuc/credential'
I0815 19:44:30.937888 27836 slave.cpp:300] Agent using credential for:
test-principal
I0815 19:44:30.937894 27836 credentials.hpp:37] Loading credentials for
authentication from '/tmp/FoShu5/Gndbuc/http_credentials'
I0815 19:44:30.937942 27836 http.cpp:975] Creating default 'basic' HTTP
authenticator for realm 'mesos-agent-executor'
I0815 19:44:30.937973 27836 http.cpp:996] Creating default 'jwt' HTTP
authenticator for realm 'mesos-agent-executor'
I0815 19:44:30.938026 27836 http.cpp:975] Creating default 'basic' HTTP
authenticator for realm 'mesos-agent-readonly'
I0815 19:44:30.938081 27836 http.cpp:996] Creating default 'jwt' HTTP
authenticator for realm 'mesos-agent-readonly'
I0815 19:44:30.938141 27836 http.cpp:975] Creating default 'basic' HTTP
authenticator for realm 'mesos-agent-readwrite'
I0815 19:44:30.938167 27836 http.cpp:996] Creating default 'jwt' HTTP
authenticator for realm 'mesos-agent-readwrite'
I0815 19:44:30.938251 27836 disk_profile_adaptor.cpp:78] Creating default disk
profile adaptor module
I0815 19:44:30.938696 27836 slave.cpp:615] Agent resources:
[{"name":"cpus","scalar":{"value":2.0},"type":"SCALAR"},{"name":"mem","scalar":{"value":1024.0},"type":"SCALAR"},{"name":"disk","scalar":{"value":1024.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"type":"RANGES"}]
I0815 19:44:30.938758 27836 slave.cpp:623] Agent attributes: [ ]
I0815 19:44:30.938764 27836 slave.cpp:632] Agent hostname:
ip-172-16-10-239.ec2.internal
I0815 19:44:30.938899 27834 task_status_update_manager.cpp:181] Pausing sending
task status updates
I0815 19:44:30.938921 27834 status_update_manager_process.hpp:379] Pausing
operation status update manager
I0815 19:44:30.939038 27836 state.cpp:67] Recovering state from
'/tmp/RegisterSlaveValidationTest_DropInvalidRegistration_001TO7/meta'
I0815 19:44:30.939085 27836 slave.cpp:7444] Finished recovering checkpointed
state from
'/tmp/RegisterSlaveValidationTest_DropInvalidRegistration_001TO7/meta',
beginning agent recovery
I0815 19:44:30.939170 27836 task_status_update_manager.cpp:207] Recovering task
status update manager
I0815 19:44:30.939250 27836 containerizer.cpp:821] Recovering Mesos containers
I0815 19:44:30.939301 27836 linux_launcher.cpp:286] Recovering Linux launcher
I0815 19:44:30.939426 27836 containerizer.cpp:1157] Recovering isolators
I0815 19:44:30.939786 27836 containerizer.cpp:1196] Recovering provisioner
I0815 19:44:30.939927 27836 provisioner.cpp:500] Provisioner recovery complete
I0815 19:44:30.940747 27831 composing.cpp:339] Finished recovering all
containerizers
I0815 19:44:30.940807 27831 slave.cpp:7908] Recovering executors
I0815 19:44:30.940826 27831 slave.cpp:8061] Finished recovery
I0815 19:44:30.941097 27835 task_status_update_manager.cpp:181] Pausing sending
task status updates
I0815 19:44:30.941113 27835 status_update_manager_process.hpp:379] Pausing
operation status update manager
I0815 19:44:30.941113 27833 slave.cpp:1351] New master detected at
[email protected]:60696
I0815 19:44:30.941131 27833 slave.cpp:1416] Detecting new master
I0815 19:44:30.941362 27834 slave.cpp:1443] Authenticating with master
[email protected]:60696
I0815 19:44:30.941385 27834 slave.cpp:1452] Using default CRAM-MD5 authenticatee
I0815 19:44:30.941454 27832 hierarchical.cpp:1510] Performed allocation for 0
agents in 7326ns
I0815 19:44:30.941464 27834 authenticatee.cpp:121] Creating new client SASL
connection
I0815 19:44:30.941867 27831 master.cpp:10615] Authenticating
slave(386)@172.16.10.239:60696
I0815 19:44:30.941941 27831 authenticator.cpp:414] Starting authentication
session for crammd5-authenticatee(795)@172.16.10.239:60696
I0815 19:44:30.941992 27831 authenticator.cpp:98] Creating new server SASL
connection
I0815 19:44:30.942328 27831 authenticatee.cpp:213] Received SASL authentication
mechanisms: CRAM-MD5
I0815 19:44:30.942351 27831 authenticatee.cpp:239] Attempting to authenticate
with mechanism 'CRAM-MD5'
I0815 19:44:30.942380 27831 authenticator.cpp:204] Received SASL authentication
start
I0815 19:44:30.942411 27831 authenticator.cpp:326] Authentication requires more
steps
I0815 19:44:30.942440 27831 authenticatee.cpp:259] Received SASL authentication
step
I0815 19:44:30.942476 27831 authenticator.cpp:232] Received SASL authentication
step
I0815 19:44:30.942489 27831 auxprop.cpp:109] Request to lookup properties for
user: 'test-principal' realm: 'ip-172-16-10-239.ec2.internal' server FQDN:
'ip-172-16-10-239.ec2.internal' SASL_AUXPROP_VERIFY_AGAINST_HASH: false
SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false
I0815 19:44:30.942497 27831 auxprop.cpp:181] Looking up auxiliary property
'*userPassword'
I0815 19:44:30.942507 27831 auxprop.cpp:181] Looking up auxiliary property
'*cmusaslsecretCRAM-MD5'
I0815 19:44:30.942514 27831 auxprop.cpp:109] Request to lookup properties for
user: 'test-principal' realm: 'ip-172-16-10-239.ec2.internal' server FQDN:
'ip-172-16-10-239.ec2.internal' SASL_AUXPROP_VERIFY_AGAINST_HASH: false
SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true
I0815 19:44:30.942520 27831 auxprop.cpp:131] Skipping auxiliary property
'*userPassword' since SASL_AUXPROP_AUTHZID == true
I0815 19:44:30.942525 27831 auxprop.cpp:131] Skipping auxiliary property
'*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
I0815 19:44:30.942538 27831 authenticator.cpp:318] Authentication success
I0815 19:44:30.942576 27831 authenticatee.cpp:299] Authentication success
I0815 19:44:30.942613 27831 master.cpp:10647] Successfully authenticated
principal 'test-principal' at slave(386)@172.16.10.239:60696
I0815 19:44:30.942633 27831 authenticator.cpp:432] Authentication session
cleanup for crammd5-authenticatee(795)@172.16.10.239:60696
I0815 19:44:30.942683 27831 slave.cpp:1543] Successfully authenticated with
master [email protected]:60696
I0815 19:44:30.942765 27831 slave.cpp:1993] Will retry registration in
4.880988ms if necessary
I0815 19:44:30.943640 27829 slave.cpp:1993] Will retry registration in
11.669214ms if necessary
I0815 19:44:30.943653 27830 master.cpp:7086] Received register agent message
from slave(386)@172.16.10.239:60696 (ip-172-16-10-239.ec2.internal)
I0815 19:44:30.943717 27830 master.cpp:4202] Authorizing agent providing
resources 'cpus:2; mem:1024; disk:1024; ports:[31000-32000]' with principal
'test-principal'
I0815 19:44:30.943888 27835 master.cpp:7153] Authorized registration of agent
at slave(386)@172.16.10.239:60696 (ip-172-16-10-239.ec2.internal)
I0815 19:44:30.943938 27835 master.cpp:7265] Registering agent at
slave(386)@172.16.10.239:60696 (ip-172-16-10-239.ec2.internal) with id
049efd6d-4fdb-48bd-94bb-60633f9e2178-S0
I0815 19:44:30.944078 27829 registrar.cpp:487] Applied 1 operations in 37329ns;
attempting to update the registry
I0815 19:44:30.944200 27830 registrar.cpp:544] Successfully updated the
registry in 0ns
I0815 19:44:30.944242 27830 master.cpp:7313] Admitted agent
049efd6d-4fdb-48bd-94bb-60633f9e2178-S0 at slave(386)@172.16.10.239:60696
(ip-172-16-10-239.ec2.internal)
I0815 19:44:30.944353 27830 master.cpp:7358] Registered agent
049efd6d-4fdb-48bd-94bb-60633f9e2178-S0 at slave(386)@172.16.10.239:60696
(ip-172-16-10-239.ec2.internal) with cpus:2; mem:1024; disk:1024;
ports:[31000-32000]
I0815 19:44:30.944420 27832 hierarchical.cpp:621] Added agent
049efd6d-4fdb-48bd-94bb-60633f9e2178-S0 (ip-172-16-10-239.ec2.internal) with
cpus:2; mem:1024; disk:1024; ports:[31000-32000] (allocated: {})
../../3rdparty/libprocess/include/process/gmock.hpp:667: Failure
Mock function called more times than expected - returning default value.
Function call: filter(@0x7f2fc75fdf00 slave(386)@172.16.10.239:60696,
@0x7f2fa0090a30 248-byte object <90-02 26-C2 2F-7F 00-00 38-E6 06-A0 2F-7F
00-00 18-63 2D-C7 2F-7F 00-00 00-63 2D-C7 2F-7F 00-00 02-00 00-00 AC-10 0A-EF
00-00 00-00 00-00 00-00 00-00 00-00 18-ED 00-00 01-00 00-00 00-00 00-00 ...
00-00 00-00 00-00 00-00 F8-21 69-C7 2F-7F 00-00 01-00 00-00 2F-7F 00-00 02-00
00-00 AC-10 0A-EF 00-00 00-00 00-00 00-00 08-E7 6A-C7 2F-7F 00-00 F0-E6 6A-C7
2F-7F 00-00 88-06 08-A0 2F-7F 00-00>)
Returns: false
Expected: to be never called
Actual: called once - over-saturated and active
I0815 19:44:30.944514 27830 slave.cpp:1576] Registered with master
[email protected]:60696; given agent ID
049efd6d-4fdb-48bd-94bb-60633f9e2178-S0
I0815 19:44:30.944610 27833 task_status_update_manager.cpp:188] Resuming
sending task status updates
I0815 19:44:30.944654 27832 hierarchical.cpp:1510] Performed allocation for 1
agents in 15896ns
I0815 19:44:30.944960 27830 slave.cpp:1611] Checkpointing SlaveInfo to
'/tmp/RegisterSlaveValidationTest_DropInvalidRegistration_001TO7/meta/slaves/049efd6d-4fdb-48bd-94bb-60633f9e2178-S0/slave.info'
I0815 19:44:30.945199 27833 status_update_manager_process.hpp:385] Resuming
operation status update manager
I0815 19:44:30.945242 27830 slave.cpp:1663] Forwarding agent update
{"operations":{},"resource_providers":{},"resource_version_uuid":{"value":"L/rZrsfDREqXN2INKab/wQ=="},"slave_id":{"value":"049efd6d-4fdb-48bd-94bb-60633f9e2178-S0"},"update_oversubscribed_resources":false}
I0815 19:44:30.945458 27830 master.cpp:8485] Ignoring update on agent
049efd6d-4fdb-48bd-94bb-60633f9e2178-S0 at slave(386)@172.16.10.239:60696
(ip-172-16-10-239.ec2.internal) as it reports no changes
W0815 19:44:30.954190 27832 master.cpp:7071] Dropping registration of agent at
slave(386)@172.16.10.239:60696 because it sent an invalid registration:
'705c555d-1d12-4536-8b46-930a7931edc6/../bfc3b1a7-8917-40cc-8edb-6f09de4b816d/../d42d1b5a-27de-4d1c-bc2c-f64301bb57d1'
contains invalid characters
I0815 19:44:30.954396 27835 slave.cpp:924] Agent terminating
I0815 19:44:30.960286 27835 master.cpp:1295] Agent
049efd6d-4fdb-48bd-94bb-60633f9e2178-S0 at slave(386)@172.16.10.239:60696
(ip-172-16-10-239.ec2.internal) disconnected
I0815 19:44:30.960304 27835 master.cpp:3397] Disconnecting agent
049efd6d-4fdb-48bd-94bb-60633f9e2178-S0 at slave(386)@172.16.10.239:60696
(ip-172-16-10-239.ec2.internal)
I0815 19:44:30.960333 27835 master.cpp:3416] Deactivating agent
049efd6d-4fdb-48bd-94bb-60633f9e2178-S0 at slave(386)@172.16.10.239:60696
(ip-172-16-10-239.ec2.internal)
I0815 19:44:30.960464 27829 hierarchical.cpp:803] Agent
049efd6d-4fdb-48bd-94bb-60633f9e2178-S0 deactivated
I0815 19:44:30.966850 27834 master.cpp:1135] Master terminating
I0815 19:44:30.966940 27833 hierarchical.cpp:779] Removed all filters for agent
049efd6d-4fdb-48bd-94bb-60633f9e2178-S0
I0815 19:44:30.966951 27833 hierarchical.cpp:654] Removed agent
049efd6d-4fdb-48bd-94bb-60633f9e2178-S0
[ FAILED ] RegisterSlaveValidationTest.DropInvalidRegistration (43 ms)
{code}
> RegisterSlaveValidationTest.DropInvalidRegistration is flaky
> ------------------------------------------------------------
>
> Key: MESOS-7441
> URL: https://issues.apache.org/jira/browse/MESOS-7441
> Project: Mesos
> Issue Type: Bug
> Components: test
> Environment: ASF CI Configuration autotools,clang,--verbose,GLOG_v=1
> MESOS_VERBOSE=1,ubuntu:14.04,(docker||Hadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)
> Reporter: Yan Xu
> Assignee: Neil Conway
> Priority: Major
> Labels: flaky, flaky-test, foundations
> Fix For: 1.4.0
>
> Attachments: log.txt
>
>
> {noformat:title=}
> [ RUN ] RegisterSlaveValidationTest.DropInvalidRegistration
> ../../3rdparty/libprocess/include/process/gmock.hpp:601: Failure
> Mock function called more times than expected - returning default value.
> Function call: filter(@0x7fd658026000 16-byte object <50-D5 AD-78 D6-7F
> 00-00 B0-BA 01-58 D6-7F 00-00>)
> Returns: false
> Expected: to be never called
> Actual: called once - over-saturated and active
> [ FAILED ] RegisterSlaveValidationTest.DropInvalidRegistration (112 ms)
> {noformat}
> Verbose log see
> [jenkins|https://builds.apache.org/job/Mesos-Buildbot/BUILDTOOL=autotools,COMPILER=clang,CONFIGURATION=--verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu:14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/3583/consoleFull]
> or attachment.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)