Till Toenshoff created MESOS-8048:
-------------------------------------
Summary: ReservationEndpointsTest.GoodReserveAndUnreserveACL is
flaky.
Key: MESOS-8048
URL: https://issues.apache.org/jira/browse/MESOS-8048
Project: Mesos
Issue Type: Bug
Components: test
Affects Versions: 1.5.0
Environment: Centos7
Reporter: Till Toenshoff
As just observed on our internal CI;
Error Message
{noformat}
../../src/tests/reservation_endpoints_tests.cpp:1026
Value of: (response).get().status
Actual: "409 Conflict"
Expected: Accepted().status
Which is: "202 Accepted"
{noformat}
Log:
{noformat}
00:42:35 [ RUN ] ReservationEndpointsTest.GoodReserveAndUnreserveACL
00:42:35 I0930 00:42:35.517658 7413 cluster.cpp:162] Creating default 'local'
authorizer
00:42:35 I0930 00:42:35.518507 7433 master.cpp:445] Master
938119f3-8007-4d6f-a45b-d49bf76a0590 (ip-172-16-10-96.ec2.internal) started on
172.16.10.96:46227
00:42:35 I0930 00:42:35.518523 7433 master.cpp:447] Flags at startup:
--acls="reserve_resources {
00:42:35 principals {
00:42:35 values: "test-principal"
00:42:35 }
00:42:35 roles {
00:42:35 type: ANY
00:42:35 }
00:42:35 }
00:42:35 unreserve_resources {
00:42:35 principals {
00:42:35 values: "test-principal"
00:42:35 }
00:42:35 reserver_principals {
00:42:35 values: "test-principal"
00:42:35 }
00:42:35 }
00:42:35 " --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins"
--allocation_interval="50ms" --allocator="HierarchicalDRF"
--authenticate_agents="true" --authenticate_frameworks="true"
--authenticate_http_frameworks="true" --authenticate_http_readonly="true"
--authenticate_http_readwrite="true" --authenticators="crammd5"
--authorizers="local" --credentials="/tmp/zFIYus/credentials"
--filter_gpu_resources="true" --framework_sorter="drf" --help="false"
--hostname_lookup="true" --http_authenticators="basic"
--http_framework_authenticators="basic" --initialize_driver_logging="true"
--log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO"
--max_agent_ping_timeouts="5" --max_completed_frameworks="50"
--max_completed_tasks_per_framework="1000"
--max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false"
--recovery_agent_removal_limit="100%" --registry="in_memory"
--registry_fetch_timeout="1mins" --registry_gc_interval="15mins"
--registry_max_agent_age="2weeks" --registry_max_agent_count="102400"
--registry_store_timeout="100secs" --registry_strict="false" --roles="role"
--root_submissions="true" --user_sorter="drf" --version="false"
--webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/zFIYus/master"
--zk_session_timeout="10secs"
00:42:35 I0930 00:42:35.518672 7433 master.cpp:497] Master only allowing
authenticated frameworks to register
00:42:35 I0930 00:42:35.518681 7433 master.cpp:511] Master only allowing
authenticated agents to register
00:42:35 I0930 00:42:35.518685 7433 master.cpp:524] Master only allowing
authenticated HTTP frameworks to register
00:42:35 I0930 00:42:35.518689 7433 credentials.hpp:37] Loading credentials
for authentication from '/tmp/zFIYus/credentials'
00:42:35 I0930 00:42:35.518784 7433 master.cpp:569] Using default 'crammd5'
authenticator
00:42:35 I0930 00:42:35.518823 7433 http.cpp:1045] Creating default 'basic'
HTTP authenticator for realm 'mesos-master-readonly'
00:42:35 I0930 00:42:35.518853 7433 http.cpp:1045] Creating default 'basic'
HTTP authenticator for realm 'mesos-master-readwrite'
00:42:35 I0930 00:42:35.518877 7433 http.cpp:1045] Creating default 'basic'
HTTP authenticator for realm 'mesos-master-scheduler'
00:42:35 I0930 00:42:35.518898 7433 master.cpp:649] Authorization enabled
00:42:35 W0930 00:42:35.518905 7433 master.cpp:712] The '--roles' flag is
deprecated. This flag will be removed in the future. See the Mesos 0.27 upgrade
notes for more information
00:42:35 I0930 00:42:35.519016 7438 whitelist_watcher.cpp:77] No whitelist
given
00:42:35 I0930 00:42:35.519018 7439 hierarchical.cpp:171] Initialized
hierarchical allocator process
00:42:35 I0930 00:42:35.519625 7433 master.cpp:2216] Elected as the leading
master!
00:42:35 I0930 00:42:35.519640 7433 master.cpp:1705] Recovering from registrar
00:42:35 I0930 00:42:35.519677 7433 registrar.cpp:347] Recovering registrar
00:42:35 I0930 00:42:35.519762 7438 registrar.cpp:391] Successfully fetched
the registry (0B) in 70144ns
00:42:35 I0930 00:42:35.519783 7438 registrar.cpp:495] Applied 1 operations
in 3246ns; attempting to update the registry
00:42:35 I0930 00:42:35.519870 7439 registrar.cpp:552] Successfully updated
the registry in 78080ns
00:42:35 I0930 00:42:35.519899 7439 registrar.cpp:424] Successfully recovered
registrar
00:42:35 I0930 00:42:35.519975 7439 master.cpp:1809] Recovered 0 agents from
the registry (168B); allowing 10mins for agents to re-register
00:42:35 I0930 00:42:35.520007 7435 hierarchical.cpp:209] Skipping recovery
of hierarchical allocator: nothing to recover
00:42:35 W0930 00:42:35.521775 7413 process.cpp:3194] Attempted to spawn
already running process [email protected]:46227
00:42:35 I0930 00:42:35.522099 7413 containerizer.cpp:292] Using isolation {
environment_secret, posix/cpu, posix/mem, filesystem/posix, network/cni }
00:42:35 I0930 00:42:35.527375 7413 linux_launcher.cpp:146] Using
/sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher
00:42:35 I0930 00:42:35.527729 7413 provisioner.cpp:255] Using default
backend 'overlay'
00:42:35 I0930 00:42:35.528136 7413 cluster.cpp:448] Creating default 'local'
authorizer
00:42:35 I0930 00:42:35.528524 7439 slave.cpp:254] Mesos agent started on
(409)@172.16.10.96:46227
00:42:35 I0930 00:42:35.528540 7439 slave.cpp:255] Flags at startup:
--acls="" --appc_simple_discovery_uri_prefix="http://"
--appc_store_dir="/tmp/ReservationEndpointsTest_GoodReserveAndUnreserveACL_o9Veil/store/appc"
--authenticate_http_readonly="true" --authenticate_http_readwrite="true"
--authenticatee="crammd5" --authentication_backoff_factor="1secs"
--authorizer="local" --cgroups_cpu_enable_pids_and_tids_count="false"
--cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup"
--cgroups_limit_swap="false" --cgroups_root="mesos"
--container_disk_watch_interval="15secs" --containerizers="mesos"
--credential="/tmp/ReservationEndpointsTest_GoodReserveAndUnreserveACL_o9Veil/credential"
--default_role="*" --disallow_sharing_agent_pid_namespace="false"
--disk_watch_interval="1mins" --docker="docker" --docker_kill_orphans="true"
--docker_registry="https://registry-1.docker.io" --docker_remove_delay="6hrs"
--docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns"
--docker_store_dir="/tmp/ReservationEndpointsTest_GoodReserveAndUnreserveACL_o9Veil/store/docker"
--docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume"
--enforce_container_disk_quota="false" --executor_registration_timeout="1mins"
--executor_reregistration_timeout="2secs"
--executor_shutdown_grace_period="5secs"
--fetcher_cache_dir="/tmp/ReservationEndpointsTest_GoodReserveAndUnreserveACL_o9Veil/fetch"
--fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks"
--gc_disk_headroom="0.1" --hadoop_home="" --help="false"
--hostname_lookup="true" --http_command_executor="false"
--http_credentials="/tmp/ReservationEndpointsTest_GoodReserveAndUnreserveACL_o9Veil/http_credentials"
--http_heartbeat_interval="30secs" --initialize_driver_logging="true"
--isolation="posix/cpu,posix/mem" --launcher="linux"
--launcher_dir="/home/centos/workspace/mesos/Mesos_CI-build/FLAG/Plain/label/mesos-ec2-centos-7/mesos/build/src"
--logbufsecs="0" --logging_level="INFO"
--max_completed_executors_per_framework="150"
--oversubscribed_resources_interval="15secs" --perf_duration="10secs"
--perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns"
--quiet="false" --recover="reconnect" --recovery_timeout="15mins"
--registration_backoff_factor="10ms" --resources="cpus:1;mem:512"
--revocable_cpu_low_priority="true"
--runtime_dir="/tmp/ReservationEndpointsTest_GoodReserveAndUnreserveACL_o9Veil"
--sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true"
--systemd_enable_support="true"
--systemd_runtime_directory="/run/systemd/system" --version="false"
--work_dir="/tmp/ReservationEndpointsTest_GoodReserveAndUnreserveACL_wtxBTT"
--zk_session_timeout="10secs"
00:42:35 I0930 00:42:35.528676 7439 credentials.hpp:86] Loading credential
for authentication from
'/tmp/ReservationEndpointsTest_GoodReserveAndUnreserveACL_o9Veil/credential'
00:42:35 I0930 00:42:35.528724 7439 slave.cpp:287] Agent using credential
for: test-principal
00:42:35 I0930 00:42:35.528730 7439 credentials.hpp:37] Loading credentials
for authentication from
'/tmp/ReservationEndpointsTest_GoodReserveAndUnreserveACL_o9Veil/http_credentials'
00:42:35 I0930 00:42:35.528782 7439 http.cpp:1045] Creating default 'basic'
HTTP authenticator for realm 'mesos-agent-readonly'
00:42:35 I0930 00:42:35.528818 7439 http.cpp:1045] Creating default 'basic'
HTTP authenticator for realm 'mesos-agent-readwrite'
00:42:35 I0930 00:42:35.529382 7439 slave.cpp:585] Agent resources:
[{"name":"cpus","scalar":{"value":1.0},"type":"SCALAR"},{"name":"mem","scalar":{"value":512.0},"type":"SCALAR"},{"name":"disk","scalar":{"value":35823.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"type":"RANGES"}]
00:42:35 I0930 00:42:35.529448 7439 slave.cpp:593] Agent attributes: [ ]
00:42:35 I0930 00:42:35.529454 7439 slave.cpp:602] Agent hostname:
ip-172-16-10-96.ec2.internal
00:42:35 I0930 00:42:35.529676 7435 status_update_manager.cpp:177] Pausing
sending status updates
00:42:35 I0930 00:42:35.529700 7437 state.cpp:64] Recovering state from
'/tmp/ReservationEndpointsTest_GoodReserveAndUnreserveACL_wtxBTT/meta'
00:42:35 I0930 00:42:35.529767 7437 status_update_manager.cpp:203] Recovering
status update manager
00:42:35 I0930 00:42:35.529822 7437 containerizer.cpp:648] Recovering
containerizer
00:42:35 I0930 00:42:35.530825 7437 provisioner.cpp:416] Provisioner recovery
complete
00:42:35 I0930 00:42:35.530910 7437 slave.cpp:6313] Finished recovery
00:42:35 I0930 00:42:35.531136 7437 slave.cpp:6495] Querying resource
estimator for oversubscribable resources
00:42:35 I0930 00:42:35.531191 7439 status_update_manager.cpp:177] Pausing
sending status updates
00:42:35 I0930 00:42:35.531201 7436 slave.cpp:993] New master detected at
[email protected]:46227
00:42:35 I0930 00:42:35.531231 7436 slave.cpp:1028] Detecting new master
00:42:35 I0930 00:42:35.531263 7436 slave.cpp:6509] Received oversubscribable
resources {} from the resource estimator
00:42:35 I0930 00:42:35.539541 7435 slave.cpp:1055] Authenticating with
master [email protected]:46227
00:42:35 I0930 00:42:35.539571 7435 slave.cpp:1066] Using default CRAM-MD5
authenticatee
00:42:35 I0930 00:42:35.539618 7435 authenticatee.cpp:121] Creating new
client SASL connection
00:42:35 I0930 00:42:35.540117 7435 master.cpp:7915] Authenticating
slave(409)@172.16.10.96:46227
00:42:35 I0930 00:42:35.540163 7435 authenticator.cpp:414] Starting
authentication session for crammd5-authenticatee(861)@172.16.10.96:46227
00:42:35 I0930 00:42:35.540216 7435 authenticator.cpp:98] Creating new server
SASL connection
00:42:35 I0930 00:42:35.540619 7435 authenticatee.cpp:213] Received SASL
authentication mechanisms: CRAM-MD5
00:42:35 I0930 00:42:35.540637 7435 authenticatee.cpp:239] Attempting to
authenticate with mechanism 'CRAM-MD5'
00:42:35 I0930 00:42:35.540665 7435 authenticator.cpp:204] Received SASL
authentication start
00:42:35 I0930 00:42:35.540694 7435 authenticator.cpp:326] Authentication
requires more steps
00:42:35 I0930 00:42:35.540724 7435 authenticatee.cpp:259] Received SASL
authentication step
00:42:35 I0930 00:42:35.540766 7435 authenticator.cpp:232] Received SASL
authentication step
00:42:35 I0930 00:42:35.540782 7435 auxprop.cpp:109] Request to lookup
properties for user: 'test-principal' realm: 'ip-172-16-10-96.ec2.internal'
server FQDN: 'ip-172-16-10-96.ec2.internal' SASL_AUXPROP_VERIFY_AGAINST_HASH:
false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false
00:42:35 I0930 00:42:35.540791 7435 auxprop.cpp:181] Looking up auxiliary
property '*userPassword'
00:42:35 I0930 00:42:35.540804 7435 auxprop.cpp:181] Looking up auxiliary
property '*cmusaslsecretCRAM-MD5'
00:42:35 I0930 00:42:35.540813 7435 auxprop.cpp:109] Request to lookup
properties for user: 'test-principal' realm: 'ip-172-16-10-96.ec2.internal'
server FQDN: 'ip-172-16-10-96.ec2.internal' SASL_AUXPROP_VERIFY_AGAINST_HASH:
false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true
00:42:35 I0930 00:42:35.540819 7435 auxprop.cpp:131] Skipping auxiliary
property '*userPassword' since SASL_AUXPROP_AUTHZID == true
00:42:35 I0930 00:42:35.540824 7435 auxprop.cpp:131] Skipping auxiliary
property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
00:42:35 I0930 00:42:35.540835 7435 authenticator.cpp:318] Authentication
success
00:42:35 I0930 00:42:35.540881 7436 authenticatee.cpp:299] Authentication
success
00:42:35 I0930 00:42:35.540894 7435 master.cpp:7945] Successfully
authenticated principal 'test-principal' at slave(409)@172.16.10.96:46227
00:42:35 I0930 00:42:35.540925 7432 authenticator.cpp:432] Authentication
session cleanup for crammd5-authenticatee(861)@172.16.10.96:46227
00:42:35 I0930 00:42:35.540977 7436 slave.cpp:1150] Successfully
authenticated with master [email protected]:46227
00:42:35 I0930 00:42:35.541028 7436 slave.cpp:1629] Will retry registration
in 11.469653ms if necessary
00:42:35 I0930 00:42:35.541095 7438 master.cpp:5819] Received register agent
message from slave(409)@172.16.10.96:46227 (ip-172-16-10-96.ec2.internal)
00:42:35 I0930 00:42:35.541121 7438 master.cpp:3856] Authorizing agent with
principal 'test-principal'
00:42:35 I0930 00:42:35.541203 7432 master.cpp:5879] Authorized registration
of agent at slave(409)@172.16.10.96:46227 (ip-172-16-10-96.ec2.internal)
00:42:35 I0930 00:42:35.541246 7432 master.cpp:5972] Registering agent at
slave(409)@172.16.10.96:46227 (ip-172-16-10-96.ec2.internal) with id
938119f3-8007-4d6f-a45b-d49bf76a0590-S0
00:42:35 I0930 00:42:35.541350 7432 registrar.cpp:495] Applied 1 operations
in 10060ns; attempting to update the registry
00:42:35 I0930 00:42:35.541492 7432 registrar.cpp:552] Successfully updated
the registry in 120064ns
00:42:35 I0930 00:42:35.541545 7437 master.cpp:6019] Admitted agent
938119f3-8007-4d6f-a45b-d49bf76a0590-S0 at slave(409)@172.16.10.96:46227
(ip-172-16-10-96.ec2.internal)
00:42:35 I0930 00:42:35.541666 7437 master.cpp:6050] Registered agent
938119f3-8007-4d6f-a45b-d49bf76a0590-S0 at slave(409)@172.16.10.96:46227
(ip-172-16-10-96.ec2.internal) with
[{"name":"cpus","scalar":{"value":1.0},"type":"SCALAR"},{"name":"mem","scalar":{"value":512.0},"type":"SCALAR"},{"name":"disk","scalar":{"value":35823.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"type":"RANGES"}]
00:42:35 I0930 00:42:35.541733 7439 hierarchical.cpp:593] Added agent
938119f3-8007-4d6f-a45b-d49bf76a0590-S0 (ip-172-16-10-96.ec2.internal) with
cpus:1; mem:512; disk:35823; ports:[31000-32000] (allocated: {})
00:42:35 I0930 00:42:35.541798 7437 slave.cpp:1196] Registered with master
[email protected]:46227; given agent ID
938119f3-8007-4d6f-a45b-d49bf76a0590-S0
00:42:35 I0930 00:42:35.541828 7439 hierarchical.cpp:1943] No allocations
performed
00:42:35 I0930 00:42:35.541842 7439 hierarchical.cpp:1486] Performed
allocation for 1 agents in 32679ns
00:42:35 I0930 00:42:35.541879 7439 status_update_manager.cpp:184] Resuming
sending status updates
00:42:35 I0930 00:42:35.542570 7433 process.cpp:3929] Handling HTTP event for
process 'master' with path: '/master/reserve'
00:42:35 I0930 00:42:35.542874 7438 http.cpp:1185] HTTP POST for
/master/reserve from 172.16.10.96:54256
00:42:35 I0930 00:42:35.543103 7437 slave.cpp:1216] Checkpointing SlaveInfo
to
'/tmp/ReservationEndpointsTest_GoodReserveAndUnreserveACL_wtxBTT/meta/slaves/938119f3-8007-4d6f-a45b-d49bf76a0590-S0/slave.info'
00:42:35 I0930 00:42:35.543090 7438 master.cpp:3641] Authorizing principal
'test-principal' to reserve resources
'[{"name":"cpus","reservations":[{"principal":"test-principal","role":"role","type":"DYNAMIC"}],"scalar":{"value":1.0},"type":"SCALAR"},{"name":"mem","reservations":[{"principal":"test-principal","role":"role","type":"DYNAMIC"}],"scalar":{"value":512.0},"type":"SCALAR"}]'
00:42:35 I0930 00:42:35.543285 7437 slave.cpp:1265] Forwarding total
oversubscribed resources {}
00:42:35 I0930 00:42:35.543319 7437 slave.cpp:4969] Received ping from
slave-observer(413)@172.16.10.96:46227
00:42:35 I0930 00:42:35.543632 7438 master.cpp:6814] Received update of agent
938119f3-8007-4d6f-a45b-d49bf76a0590-S0 at slave(409)@172.16.10.96:46227
(ip-172-16-10-96.ec2.internal) with total oversubscribed resources {}
00:42:35 I0930 00:42:35.543754 7438 master.cpp:9314] Sending updated
checkpointed resources cpus(reservations: [(DYNAMIC,role,test-principal)]):1;
mem(reservations: [(DYNAMIC,role,test-principal)]):512 to agent
938119f3-8007-4d6f-a45b-d49bf76a0590-S0 at slave(409)@172.16.10.96:46227
(ip-172-16-10-96.ec2.internal)
00:42:35 I0930 00:42:35.543889 7437 hierarchical.cpp:660] Agent
938119f3-8007-4d6f-a45b-d49bf76a0590-S0 (ip-172-16-10-96.ec2.internal) updated
with total resources cpus:1; mem:512; disk:35823; ports:[31000-32000]
00:42:35 I0930 00:42:35.543952 7437 hierarchical.cpp:1943] No allocations
performed
00:42:35 I0930 00:42:35.543967 7437 hierarchical.cpp:1486] Performed
allocation for 1 agents in 29057ns
00:42:35 I0930 00:42:35.544109 7438 slave.cpp:3522] Updated checkpointed
resources from {} to cpus(reservations: [(DYNAMIC,role,test-principal)]):1;
mem(reservations: [(DYNAMIC,role,test-principal)]):512
00:42:35 I0930 00:42:35.544886 7439 process.cpp:3929] Handling HTTP event for
process 'master' with path: '/master/unreserve'
00:42:35 I0930 00:42:35.545197 7437 http.cpp:1185] HTTP POST for
/master/unreserve from 172.16.10.96:54258
00:42:35 I0930 00:42:35.545383 7437 master.cpp:3709] Authorizing principal
'test-principal' to unreserve resources
'[{"name":"cpus","reservations":[{"principal":"test-principal","role":"role","type":"DYNAMIC"}],"scalar":{"value":1.0},"type":"SCALAR"},{"name":"mem","reservations":[{"principal":"test-principal","role":"role","type":"DYNAMIC"}],"scalar":{"value":512.0},"type":"SCALAR"}]'
00:42:35 ../../src/tests/reservation_endpoints_tests.cpp:1026: Failure
00:42:35 Value of: (response).get().status
00:42:35 Actual: "409 Conflict"
00:42:35 Expected: Accepted().status
00:42:35 Which is: "202 Accepted"
00:42:35 I0930 00:42:35.546277 7413 slave.cpp:869] Agent terminating
00:42:35 I0930 00:42:35.546371 7439 master.cpp:1321] Agent
938119f3-8007-4d6f-a45b-d49bf76a0590-S0 at slave(409)@172.16.10.96:46227
(ip-172-16-10-96.ec2.internal) disconnected
00:42:35 I0930 00:42:35.546391 7439 master.cpp:3354] Disconnecting agent
938119f3-8007-4d6f-a45b-d49bf76a0590-S0 at slave(409)@172.16.10.96:46227
(ip-172-16-10-96.ec2.internal)
00:42:35 I0930 00:42:35.546404 7439 master.cpp:3373] Deactivating agent
938119f3-8007-4d6f-a45b-d49bf76a0590-S0 at slave(409)@172.16.10.96:46227
(ip-172-16-10-96.ec2.internal)
00:42:35 I0930 00:42:35.546520 7438 hierarchical.cpp:690] Agent
938119f3-8007-4d6f-a45b-d49bf76a0590-S0 deactivated
00:42:35 I0930 00:42:35.547936 7413 master.cpp:1163] Master terminating
00:42:35 I0930 00:42:35.548065 7439 hierarchical.cpp:626] Removed agent
938119f3-8007-4d6f-a45b-d49bf76a0590-S0
00:42:35 [ FAILED ] ReservationEndpointsTest.GoodReserveAndUnreserveACL (33
ms)
{noformat}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)