Vinod Kone created MESOS-6911:
---------------------------------
Summary: SlaveRecoveryTest/0.RegisterDisconnectedSlave test is
flaky
Key: MESOS-6911
URL: https://issues.apache.org/jira/browse/MESOS-6911
Project: Mesos
Issue Type: Bug
Environment: ASF CI
Reporter: Vinod Kone
Assignee: Vinod Kone
Observed this on ASF CI:
{code}
[ RUN ] SlaveRecoveryTest/0.RegisterDisconnectedSlave
I0111 18:52:55.691341 27071 cluster.cpp:160] Creating default 'local' authorizer
I0111 18:52:55.693492 27076 master.cpp:383] Master
f193ab26-63f6-4284-9de0-cd89bb73adb4 (b09aa0a37318) started on 172.17.0.3:42061
I0111 18:52:55.693522 27076 master.cpp:385] Flags at startup: --acls=""
--agent_ping_timeout="15secs" --agent_reregister_timeout="10mins"
--allocation_interval="1secs" --allocat
or="HierarchicalDRF" --authenticate_agents="false"
--authenticate_frameworks="true" --authenticate_http_frameworks="true"
--authenticate_http_readonly="true" --authenticate_http
_readwrite="true" --authenticators="crammd5" --authorizers="local"
--credentials="/tmp/e83N9C/credentials" --framework_sorter="drf" --help="false"
--hostname_lookup="true" --htt
p_authenticators="basic" --http_framework_authenticators="basic"
--initialize_driver_logging="true" --log_auto_initialize="true"
--logbufsecs="0" --logging_level="INFO" --max_ag
ent_ping_timeouts="5" --max_completed_frameworks="50"
--max_completed_tasks_per_framework="1000" --quiet="false"
--recovery_agent_removal_limit="100%" --registry="in_memory" --r
egistry_fetch_timeout="1mins" --registry_gc_interval="15mins"
--registry_max_agent_age="2weeks" --registry_max_agent_count="102400"
--registry_store_timeout="100secs" --registry
_strict="false" --root_submissions="true" --user_sorter="drf" --version="false"
--webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/e83N9C/master"
--zk_session_timeout="
10secs"
I0111 18:52:55.693800 27076 master.cpp:435] Master only allowing authenticated
frameworks to register
I0111 18:52:55.693811 27076 master.cpp:451] Master allowing unauthenticated
agents to register
I0111 18:52:55.693819 27076 master.cpp:462] Master only allowing authenticated
HTTP frameworks to register
I0111 18:52:55.693825 27076 credentials.hpp:37] Loading credentials for
authentication from '/tmp/e83N9C/credentials'
I0111 18:52:55.693991 27076 master.cpp:507] Using default 'crammd5'
authenticator
I0111 18:52:55.694067 27076 http.cpp:922] Using default 'basic' HTTP
authenticator for realm 'mesos-master-readonly'
I0111 18:52:55.694119 27076 http.cpp:922] Using default 'basic' HTTP
authenticator for realm 'mesos-master-readwrite'
I0111 18:52:55.694142 27076 http.cpp:922] Using default 'basic' HTTP
authenticator for realm 'mesos-master-scheduler'
I0111 18:52:55.694156 27076 master.cpp:587] Authorization enabled
I0111 18:52:55.694834 27083 hierarchical.cpp:149] Initialized hierarchical
allocator process
I0111 18:52:55.694885 27083 whitelist_watcher.cpp:77] No whitelist given
I0111 18:52:55.695083 27076 master.cpp:2119] Elected as the leading master!
I0111 18:52:55.695096 27076 master.cpp:1641] Recovering from registrar
I0111 18:52:55.695166 27076 registrar.cpp:329] Recovering registrar
I0111 18:52:55.695739 27076 registrar.cpp:362] Successfully fetched the
registry (0B) in 551936ns
I0111 18:52:55.695775 27076 registrar.cpp:461] Applied 1 operations in 4972ns;
attempting to update the registry
I0111 18:52:55.696069 27076 registrar.cpp:506] Successfully updated the
registry in 273920ns
I0111 18:52:55.696121 27076 registrar.cpp:392] Successfully recovered registrar
I0111 18:52:55.696553 27079 hierarchical.cpp:176] Skipping recovery of
hierarchical allocator: nothing to recover
I0111 18:52:55.696570 27073 master.cpp:1757] Recovered 0 agents from the
registry (129B); allowing 10mins for agents to re-register
I0111 18:52:55.698655 27071 containerizer.cpp:220] Using isolation:
posix/cpu,posix/mem,filesystem/posix,network/cni
W0111 18:52:55.699059 27071 backend.cpp:76] Failed to create 'aufs' backend:
AufsBackend requires root privileges, but is running as user mesos
W0111 18:52:55.699148 27071 backend.cpp:76] Failed to create 'bind' backend:
BindBackend requires root privileges
I0111 18:52:55.701257 27071 cluster.cpp:446] Creating default 'local' authorizer
I0111 18:52:55.702049 27077 slave.cpp:209] Mesos agent started on
(485)@172.17.0.3:42061
I0111 18:52:55.702069 27077 slave.cpp:210] Flags at startup: --acls=""
--appc_simple_discovery_uri_prefix="http://"
--appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="true"
--authenticate_http_readwrite="true" --authenticatee="crammd5"
--authentication_backoff_factor="1secs" --authorizer="local"
--cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false"
--cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false"
--cgroups_root="mesos" --container_disk_watch_interval="15secs"
--containerizers="mesos"
--credential="/tmp/SlaveRecoveryTest_0_RegisterDisconnectedSlave_IX53a9/credential"
--default_role="*" --disk_watch_interval="1mins" --docker="docker"
--docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io"
--docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock"
--docker_stop_timeout="0ns" --docker_store_dir="/tmp/mesos/store/docker"
--docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume"
--enforce_container_disk_quota="false" --executor_registration_timeout="1mins"
--executor_shutdown_grace_period="5secs"
--fetcher_cache_dir="/tmp/SlaveRecoveryTest_0_RegisterDisconnectedSlave_IX53a9/fetch"
--fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks"
--gc_disk_headroom="0.1" --hadoop_home="" --help="false"
--hostname_lookup="true" --http_authenticators="basic"
--http_command_executor="false"
--http_credentials="/tmp/SlaveRecoveryTest_0_RegisterDisconnectedSlave_IX53a9/http_credentials"
--http_heartbeat_interval="30secs" --image_provisioner_backend="copy"
--initialize_driver_logging="true" --isolation="posix/cpu,posix/mem"
--launcher="posix" --launcher_dir="/mesos/build/src" --logbufsecs="0"
--logging_level="INFO" --max_completed_executors_per_framework="150"
--oversubscribed_resources_interval="15secs" --perf_duration="10secs"
--perf_interval="1mins" --qos_correction_interval_min="0ns" --quiet="false"
--recover="reconnect" --recovery_timeout="15mins"
--registration_backoff_factor="10ms"
--resources="cpus:2;gpus:0;mem:1024;disk:1024;ports:[31000-32000]"
--revocable_cpu_low_priority="true"
--runtime_dir="/tmp/SlaveRecoveryTest_0_RegisterDisconnectedSlave_IX53a9"
--sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true"
--systemd_enable_support="true"
--systemd_runtime_directory="/run/systemd/system" --version="false"
--work_dir="/tmp/SlaveRecoveryTest_0_RegisterDisconnectedSlave_XwgqaT"
I0111 18:52:55.702533 27077 credentials.hpp:86] Loading credential for
authentication from
'/tmp/SlaveRecoveryTest_0_RegisterDisconnectedSlave_IX53a9/credential'
I0111 18:52:55.702687 27077 slave.cpp:352] Agent using credential for:
test-principal
I0111 18:52:55.702721 27077 credentials.hpp:37] Loading credentials for
authentication from
'/tmp/SlaveRecoveryTest_0_RegisterDisconnectedSlave_IX53a9/http_credentials'
I0111 18:52:55.702865 27077 http.cpp:922] Using default 'basic' HTTP
authenticator for realm 'mesos-agent-readonly'
I0111 18:52:55.703058 27077 http.cpp:922] Using default 'basic' HTTP
authenticator for realm 'mesos-agent-readwrite'
I0111 18:52:55.703541 27077 slave.cpp:539] Agent resources: cpus(*):2;
mem(*):1024; disk(*):1024; ports(*):[31000-32000]
I0111 18:52:55.703575 27077 slave.cpp:547] Agent attributes: [ ]
I0111 18:52:55.703583 27077 slave.cpp:552] Agent hostname: b09aa0a37318
I0111 18:52:55.704030 27086 state.cpp:60] Recovering state from
'/tmp/SlaveRecoveryTest_0_RegisterDisconnectedSlave_XwgqaT/meta'
I0111 18:52:55.704193 27086 status_update_manager.cpp:203] Recovering status
update manager
I0111 18:52:55.704581 27086 containerizer.cpp:599] Recovering containerizer
I0111 18:52:55.705227 27080 provisioner.cpp:251] Provisioner recovery complete
I0111 18:52:55.705379 27086 slave.cpp:5421] Finished recovery
I0111 18:52:55.705912 27086 slave.cpp:5595] Querying resource estimator for
oversubscribable resources
I0111 18:52:55.706006 27086 slave.cpp:5609] Received oversubscribable resources
{} from the resource estimator
I0111 18:52:55.706210 27081 status_update_manager.cpp:177] Pausing sending
status updates
I0111 18:52:55.706233 27078 slave.cpp:924] New master detected at
[email protected]:42061
I0111 18:52:55.706285 27078 slave.cpp:959] Detecting new master
I0111 18:52:55.715513 27078 slave.cpp:986] Authenticating with master
[email protected]:42061
I0111 18:52:55.715556 27078 slave.cpp:997] Using default CRAM-MD5 authenticatee
I0111 18:52:55.715667 27077 authenticatee.cpp:121] Creating new client SASL
connection
I0111 18:52:55.715894 27081 master.cpp:6835] Authenticating
slave(485)@172.17.0.3:42061
I0111 18:52:55.715944 27077 authenticator.cpp:414] Starting authentication
session for crammd5-authenticatee(986)@172.17.0.3:42061
I0111 18:52:55.716037 27075 authenticator.cpp:98] Creating new server SASL
connection
I0111 18:52:55.716234 27080 authenticatee.cpp:213] Received SASL authentication
mechanisms: CRAM-MD5
I0111 18:52:55.716277 27080 authenticatee.cpp:239] Attempting to authenticate
with mechanism 'CRAM-MD5'
I0111 18:52:55.716333 27080 authenticator.cpp:204] Received SASL authentication
start
I0111 18:52:55.716477 27080 authenticator.cpp:326] Authentication requires more
steps
I0111 18:52:55.716605 27080 authenticatee.cpp:259] Received SASL authentication
step
I0111 18:52:55.716790 27080 authenticator.cpp:232] Received SASL authentication
step
I0111 18:52:55.716907 27080 auxprop.cpp:109] Request to lookup properties for
user: 'test-principal' realm: 'b09aa0a37318' server FQDN: 'b09aa0a37318'
SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false
SASL_AUXPROP_AUTHZID: false
I0111 18:52:55.717041 27080 auxprop.cpp:181] Looking up auxiliary property
'*userPassword'
I0111 18:52:55.717185 27080 auxprop.cpp:181] Looking up auxiliary property
'*cmusaslsecretCRAM-MD5'
I0111 18:52:55.717207 27080 auxprop.cpp:109] Request to lookup properties for
user: 'test-principal' realm: 'b09aa0a37318' server FQDN: 'b09aa0a37318'
SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false
SASL_AUXPROP_AUTHZID: true
I0111 18:52:55.717247 27080 auxprop.cpp:131] Skipping auxiliary property
'*userPassword' since SASL_AUXPROP_AUTHZID == true
I0111 18:52:55.717257 27080 auxprop.cpp:131] Skipping auxiliary property
'*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
I0111 18:52:55.717296 27080 authenticator.cpp:318] Authentication success
I0111 18:52:55.717373 27080 authenticatee.cpp:299] Authentication success
I0111 18:52:55.717404 27086 master.cpp:6865] Successfully authenticated
principal 'test-principal' at slave(485)@172.17.0.3:42061
I0111 18:52:55.717432 27081 authenticator.cpp:432] Authentication session
cleanup for crammd5-authenticatee(986)@172.17.0.3:42061
I0111 18:52:55.717664 27080 slave.cpp:1081] Successfully authenticated with
master [email protected]:42061
I0111 18:52:55.717756 27080 slave.cpp:1503] Will retry registration in
17.356946ms if necessary
I0111 18:52:55.717931 27081 master.cpp:5234] Registering agent at
slave(485)@172.17.0.3:42061 (b09aa0a37318) with id
f193ab26-63f6-4284-9de0-cd89bb73adb4-S0
I0111 18:52:55.718101 27077 registrar.cpp:461] Applied 1 operations in 20803ns;
attempting to update the registry
I0111 18:52:55.718472 27076 registrar.cpp:506] Successfully updated the
registry in 338176ns
I0111 18:52:55.719049 27081 slave.cpp:4285] Received ping from
slave-observer(462)@172.17.0.3:42061
I0111 18:52:55.719120 27081 slave.cpp:1127] Registered with master
[email protected]:42061; given agent ID f193ab26-63f6-4284-9de0-cd89bb73adb4-S0
I0111 18:52:55.719139 27081 fetcher.cpp:90] Clearing fetcher cache
I0111 18:52:55.719128 27084 master.cpp:5305] Registered agent
f193ab26-63f6-4284-9de0-cd89bb73adb4-S0 at slave(485)@172.17.0.3:42061
(b09aa0a37318) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000]
I0111 18:52:55.719342 27084 hierarchical.cpp:491] Added agent
f193ab26-63f6-4284-9de0-cd89bb73adb4-S0 (b09aa0a37318) with cpus(*):2;
mem(*):1024; disk(*):1024; ports(*):[31000-32000] (allocated: {})
I0111 18:52:55.719408 27084 hierarchical.cpp:1690] No allocations performed
I0111 18:52:55.719430 27084 hierarchical.cpp:1315] Performed allocation for
agent f193ab26-63f6-4284-9de0-cd89bb73adb4-S0 in 61136ns
I0111 18:52:55.719466 27081 slave.cpp:1155] Checkpointing SlaveInfo to
'/tmp/SlaveRecoveryTest_0_RegisterDisconnectedSlave_XwgqaT/meta/slaves/f193ab26-63f6-4284-9de0-cd89bb73adb4-S0/slave.info'
I0111 18:52:55.719480 27084 status_update_manager.cpp:184] Resuming sending
status updates
I0111 18:52:55.719810 27081 slave.cpp:1193] Forwarding total oversubscribed
resources {}
I0111 18:52:55.719862 27081 master.cpp:5712] Received update of agent
f193ab26-63f6-4284-9de0-cd89bb73adb4-S0 at slave(485)@172.17.0.3:42061
(b09aa0a37318) with total oversubscribed resources {}
I0111 18:52:55.720095 27087 hierarchical.cpp:561] Agent
f193ab26-63f6-4284-9de0-cd89bb73adb4-S0 (b09aa0a37318) updated with
oversubscribed resources {} (total: cpus(*):2; mem(*):1024; disk(*):1024;
ports(*):[31000-32000], allocated: {})
I0111 18:52:55.720407 27087 hierarchical.cpp:1690] No allocations performed
I0111 18:52:55.720433 27087 hierarchical.cpp:1315] Performed allocation for
agent f193ab26-63f6-4284-9de0-cd89bb73adb4-S0 in 161873ns
I0111 18:52:55.721096 27071 sched.cpp:232] Version: 1.2.0
I0111 18:52:55.721359 27079 sched.cpp:336] New master detected at
[email protected]:42061
I0111 18:52:55.721395 27079 sched.cpp:407] Authenticating with master
[email protected]:42061
I0111 18:52:55.721410 27079 sched.cpp:414] Using default CRAM-MD5 authenticatee
I0111 18:52:55.721606 27072 authenticatee.cpp:121] Creating new client SASL
connection
I0111 18:52:55.721803 27072 master.cpp:6835] Authenticating
[email protected]:42061
I0111 18:52:55.722048 27072 authenticator.cpp:414] Starting authentication
session for crammd5-authenticatee(987)@172.17.0.3:42061
I0111 18:52:55.722317 27072 authenticator.cpp:98] Creating new server SASL
connection
I0111 18:52:55.722518 27072 authenticatee.cpp:213] Received SASL authentication
mechanisms: CRAM-MD5
I0111 18:52:55.722548 27072 authenticatee.cpp:239] Attempting to authenticate
with mechanism 'CRAM-MD5'
I0111 18:52:55.722594 27072 authenticator.cpp:204] Received SASL authentication
start
I0111 18:52:55.722645 27072 authenticator.cpp:326] Authentication requires more
steps
I0111 18:52:55.722689 27072 authenticatee.cpp:259] Received SASL authentication
step
I0111 18:52:55.722745 27072 authenticator.cpp:232] Received SASL authentication
step
I0111 18:52:55.722771 27072 auxprop.cpp:109] Request to lookup properties for
user: 'test-principal' realm: 'b09aa0a37318' server FQDN: 'b09aa0a37318'
SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false
SASL_AUXPROP_AUTHZID: false
I0111 18:52:55.722784 27072 auxprop.cpp:181] Looking up auxiliary property
'*userPassword'
I0111 18:52:55.722797 27072 auxprop.cpp:181] Looking up auxiliary property
'*cmusaslsecretCRAM-MD5'
I0111 18:52:55.722815 27072 auxprop.cpp:109] Request to lookup properties for
user: 'test-principal' realm: 'b09aa0a37318' server FQDN: 'b09aa0a37318'
SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false
SASL_AUXPROP_AUTHZID: true
I0111 18:52:55.722826 27072 auxprop.cpp:131] Skipping auxiliary property
'*userPassword' since SASL_AUXPROP_AUTHZID == true
I0111 18:52:55.722832 27072 auxprop.cpp:131] Skipping auxiliary property
'*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
I0111 18:52:55.722847 27072 authenticator.cpp:318] Authentication success
I0111 18:52:55.722895 27072 authenticatee.cpp:299] Authentication success
I0111 18:52:55.722937 27072 master.cpp:6865] Successfully authenticated
principal 'test-principal' at
[email protected]:42061
I0111 18:52:55.722971 27072 authenticator.cpp:432] Authentication session
cleanup for crammd5-authenticatee(987)@172.17.0.3:42061
I0111 18:52:55.723062 27072 sched.cpp:513] Successfully authenticated with
master [email protected]:42061
I0111 18:52:55.723081 27072 sched.cpp:836] Sending SUBSCRIBE call to
[email protected]:42061
I0111 18:52:55.723125 27072 sched.cpp:869] Will retry registration in
1.630125153secs if necessary
I0111 18:52:55.723235 27072 master.cpp:2707] Received SUBSCRIBE call for
framework 'default' at
[email protected]:42061
I0111 18:52:55.723264 27072 master.cpp:2155] Authorizing framework principal
'test-principal' to receive offers for role '*'
I0111 18:52:55.723398 27072 master.cpp:2783] Subscribing framework default with
checkpointing enabled and capabilities [ ]
I0111 18:52:55.723603 27072 hierarchical.cpp:277] Added framework
f193ab26-63f6-4284-9de0-cd89bb73adb4-0000
I0111 18:52:55.723865 27072 hierarchical.cpp:1785] No inverse offers to send
out!
I0111 18:52:55.723897 27072 hierarchical.cpp:1292] Performed allocation for 1
agents in 273576ns
I0111 18:52:55.724014 27072 sched.cpp:759] Framework registered with
f193ab26-63f6-4284-9de0-cd89bb73adb4-0000
I0111 18:52:55.724050 27072 sched.cpp:773] Scheduler::registered took 15750ns
I0111 18:52:55.724215 27072 master.cpp:6664] Sending 1 offers to framework
f193ab26-63f6-4284-9de0-cd89bb73adb4-0000 (default) at
[email protected]:42061
I0111 18:52:55.724386 27072 sched.cpp:933] Scheduler::resourceOffers took
36299ns
I0111 18:52:55.725075 27085 master.cpp:3662] Processing ACCEPT call for offers:
[ f193ab26-63f6-4284-9de0-cd89bb73adb4-O0 ] on agent
f193ab26-63f6-4284-9de0-cd89bb73adb4-S0 at slave(485)@172.17.0.3:42061
(b09aa0a37318) for framework f193ab26-63f6-4284-9de0-cd89bb73adb4-0000
(default) at [email protected]:42061
I0111 18:52:55.725113 27085 master.cpp:3249] Authorizing framework principal
'test-principal' to launch task b969d109-e0fb-4308-9290-92b22ae8c4b7
I0111 18:52:55.725637 27085 master.cpp:8581] Adding task
b969d109-e0fb-4308-9290-92b22ae8c4b7 with resources cpus(*):2; mem(*):1024;
disk(*):1024; ports(*):[31000-32000] on agent
f193ab26-63f6-4284-9de0-cd89bb73adb4-S0 at slave(485)@172.17.0.3:42061
(b09aa0a37318)
I0111 18:52:55.725724 27085 master.cpp:4313] Launching task
b969d109-e0fb-4308-9290-92b22ae8c4b7 of framework
f193ab26-63f6-4284-9de0-cd89bb73adb4-0000 (default) at
[email protected]:42061 with resources
cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on agent
f193ab26-63f6-4284-9de0-cd89bb73adb4-S0 at slave(485)@172.17.0.3:42061
(b09aa0a37318)
I0111 18:52:55.726084 27085 slave.cpp:1571] Got assigned task
'b969d109-e0fb-4308-9290-92b22ae8c4b7' for framework
f193ab26-63f6-4284-9de0-cd89bb73adb4-0000
I0111 18:52:55.726239 27085 slave.cpp:6262] Checkpointing FrameworkInfo to
'/tmp/SlaveRecoveryTest_0_RegisterDisconnectedSlave_XwgqaT/meta/slaves/f193ab26-63f6-4284-9de0-cd89bb73adb4-S0/frameworks/f193ab26-63f6-4284-9de0-cd89bb73adb4-0000/framework.info'
I0111 18:52:55.726835 27085 slave.cpp:6273] Checkpointing framework pid
'[email protected]:42061' to
'/tmp/SlaveRecoveryTest_0_RegisterDisconnectedSlave_XwgqaT/meta/slaves/f193ab26-63f6-4284-9de0-cd89bb73adb4-S0/frameworks/f193ab26-63f6-4284-9de0-cd89bb73adb4-0000/framework.pid'
I0111 18:52:55.727361 27085 slave.cpp:1731] Launching task
'b969d109-e0fb-4308-9290-92b22ae8c4b7' for framework
f193ab26-63f6-4284-9de0-cd89bb73adb4-0000
I0111 18:52:55.727908 27085 paths.cpp:530] Trying to chown
'/tmp/SlaveRecoveryTest_0_RegisterDisconnectedSlave_XwgqaT/slaves/f193ab26-63f6-4284-9de0-cd89bb73adb4-S0/frameworks/f193ab26-63f6-4284-9de0-cd89bb73adb4-0000/executors/b969d109-e0fb-4308-9290-92b22ae8c4b7/runs/5ca261a9-7655-4473-ad23-4638c856c858'
to user 'mesos'
I0111 18:52:55.732523 27085 slave.cpp:6739] Checkpointing ExecutorInfo to
'/tmp/SlaveRecoveryTest_0_RegisterDisconnectedSlave_XwgqaT/meta/slaves/f193ab26-63f6-4284-9de0-cd89bb73adb4-S0/frameworks/f193ab26-63f6-4284-9de0-cd89bb73adb4-0000/executors/b969d109-e0fb-4308-9290-92b22ae8c4b7/executor.info'
I0111 18:52:55.733202 27085 slave.cpp:6348] Launching executor
'b969d109-e0fb-4308-9290-92b22ae8c4b7' of framework
f193ab26-63f6-4284-9de0-cd89bb73adb4-0000 with resources cpus(*):0.1; mem(*):32
in work directory
'/tmp/SlaveRecoveryTest_0_RegisterDisconnectedSlave_XwgqaT/slaves/f193ab26-63f6-4284-9de0-cd89bb73adb4-S0/frameworks/f193ab26-63f6-4284-9de0-cd89bb73adb4-0000/executors/b969d109-e0fb-4308-9290-92b22ae8c4b7/runs/5ca261a9-7655-4473-ad23-4638c856c858'
I0111 18:52:55.733749 27077 containerizer.cpp:991] Starting container
5ca261a9-7655-4473-ad23-4638c856c858 for executor
'b969d109-e0fb-4308-9290-92b22ae8c4b7' of framework
f193ab26-63f6-4284-9de0-cd89bb73adb4-0000
I0111 18:52:55.733780 27085 slave.cpp:6762] Checkpointing TaskInfo to
'/tmp/SlaveRecoveryTest_0_RegisterDisconnectedSlave_XwgqaT/meta/slaves/f193ab26-63f6-4284-9de0-cd89bb73adb4-S0/frameworks/f193ab26-63f6-4284-9de0-cd89bb73adb4-0000/executors/b969d109-e0fb-4308-9290-92b22ae8c4b7/runs/5ca261a9-7655-4473-ad23-4638c856c858/tasks/b969d109-e0fb-4308-9290-92b22ae8c4b7/task.info'
I0111 18:52:55.734252 27085 slave.cpp:2053] Queued task
'b969d109-e0fb-4308-9290-92b22ae8c4b7' for executor
'b969d109-e0fb-4308-9290-92b22ae8c4b7' of framework
f193ab26-63f6-4284-9de0-cd89bb73adb4-0000
I0111 18:52:55.734329 27085 slave.cpp:877] Successfully attached file
'/tmp/SlaveRecoveryTest_0_RegisterDisconnectedSlave_XwgqaT/slaves/f193ab26-63f6-4284-9de0-cd89bb73adb4-S0/frameworks/f193ab26-63f6-4284-9de0-cd89bb73adb4-0000/executors/b969d109-e0fb-4308-9290-92b22ae8c4b7/runs/5ca261a9-7655-4473-ad23-4638c856c858'
I0111 18:52:55.735308 27081 containerizer.cpp:1540] Launching
'mesos-containerizer' with flags '--help="false"
--launch_info="{"command":{"arguments":["mesos-executor","--launcher_dir=\/mesos\/build\/src"],"shell":false,"value":"\/mesos\/build\/src\/mesos-executor"},"environment":{"variables":[{"name":"LIBPROCESS_PORT","value":"0"},{"name":"MESOS_AGENT_ENDPOINT","value":"172.17.0.3:42061"},{"name":"MESOS_CHECKPOINT","value":"1"},{"name":"MESOS_DIRECTORY","value":"\/tmp\/SlaveRecoveryTest_0_RegisterDisconnectedSlave_XwgqaT\/slaves\/f193ab26-63f6-4284-9de0-cd89bb73adb4-S0\/frameworks\/f193ab26-63f6-4284-9de0-cd89bb73adb4-0000\/executors\/b969d109-e0fb-4308-9290-92b22ae8c4b7\/runs\/5ca261a9-7655-4473-ad23-4638c856c858"},{"name":"MESOS_EXECUTOR_ID","value":"b969d109-e0fb-4308-9290-92b22ae8c4b7"},{"name":"MESOS_EXECUTOR_SHUTDOWN_GRACE_PERIOD","value":"5secs"},{"name":"MESOS_FRAMEWORK_ID","value":"f193ab26-63f6-4284-9de0-cd89bb73adb4-0000"},{"name":"MESOS_HTTP_COMMAND_EXECUTOR","value":"0"},{"name":"MESOS_RECOVERY_TIMEOUT","value":"15mins"},{"name":"MESOS_SLAVE_ID","value":"f193ab26-63f6-4284-9de0-cd89bb73adb4-S0"},{"name":"MESOS_SLAVE_PID","value":"slave(485)@172.17.0.3:42061"},{"name":"MESOS_SUBSCRIPTION_BACKOFF_MAX","value":"2secs"},{"name":"MESOS_SANDBOX","value":"\/tmp\/SlaveRecoveryTest_0_RegisterDisconnectedSlave_XwgqaT\/slaves\/f193ab26-63f6-4284-9de0-cd89bb73adb4-S0\/frameworks\/f193ab26-63f6-4284-9de0-cd89bb73adb4-0000\/executors\/b969d109-e0fb-4308-9290-92b22ae8c4b7\/runs\/5ca261a9-7655-4473-ad23-4638c856c858"}]},"user":"mesos","working_directory":"\/tmp\/SlaveRecoveryTest_0_RegisterDisconnectedSlave_XwgqaT\/slaves\/f193ab26-63f6-4284-9de0-cd89bb73adb4-S0\/frameworks\/f193ab26-63f6-4284-9de0-cd89bb73adb4-0000\/executors\/b969d109-e0fb-4308-9290-92b22ae8c4b7\/runs\/5ca261a9-7655-4473-ad23-4638c856c858"}"
--pipe_read="7" --pipe_write="8"
--runtime_directory="/tmp/SlaveRecoveryTest_0_RegisterDisconnectedSlave_IX53a9/containers/5ca261a9-7655-4473-ad23-4638c856c858"
--unshare_namespace_mnt="false"'
I0111 18:52:55.736647 27081 launcher.cpp:133] Forked child with pid '30914' for
container '5ca261a9-7655-4473-ad23-4638c856c858'
I0111 18:52:55.736830 27081 containerizer.cpp:1639] Checkpointing container's
forked pid 30914 to
'/tmp/SlaveRecoveryTest_0_RegisterDisconnectedSlave_XwgqaT/meta/slaves/f193ab26-63f6-4284-9de0-cd89bb73adb4-S0/frameworks/f193ab26-63f6-4284-9de0-cd89bb73adb4-0000/executors/b969d109-e0fb-4308-9290-92b22ae8c4b7/runs/5ca261a9-7655-4473-ad23-4638c856c858/pids/forked.pid'
I0111 18:52:55.738760 27080 fetcher.cpp:349] Starting to fetch URIs for
container: 5ca261a9-7655-4473-ad23-4638c856c858, directory:
/tmp/SlaveRecoveryTest_0_RegisterDisconnectedSlave_XwgqaT/slaves/f193ab26-63f6-4284-9de0-cd89bb73adb4-S0/frameworks/f193ab26-63f6-4284-9de0-cd89bb73adb4-0000/executors/b969d109-e0fb-4308-9290-92b22ae8c4b7/runs/5ca261a9-7655-4473-ad23-4638c856c858
I0111 18:52:55.828845 30921 exec.cpp:162] Version: 1.2.0
I0111 18:52:55.833731 27078 slave.cpp:3322] Got registration for executor
'b969d109-e0fb-4308-9290-92b22ae8c4b7' of framework
f193ab26-63f6-4284-9de0-cd89bb73adb4-0000 from executor(1)@172.17.0.3:44874
I0111 18:52:55.833977 27078 slave.cpp:3408] Checkpointing executor pid
'executor(1)@172.17.0.3:44874' to
'/tmp/SlaveRecoveryTest_0_RegisterDisconnectedSlave_XwgqaT/meta/slaves/f193ab26-63f6-4284-9de0-cd89bb73adb4-S0/frameworks/f193ab26-63f6-4284-9de0-cd89bb73adb4-0000/executors/b969d109-e0fb-4308-9290-92b22ae8c4b7/runs/5ca261a9-7655-4473-ad23-4638c856c858/pids/libprocess.pid'
I0111 18:52:55.835109 30923 exec.cpp:237] Executor registered on agent
f193ab26-63f6-4284-9de0-cd89bb73adb4-S0
I0111 18:52:55.835211 27078 slave.cpp:2267] Sending queued task
'b969d109-e0fb-4308-9290-92b22ae8c4b7' to executor
'b969d109-e0fb-4308-9290-92b22ae8c4b7' of framework
f193ab26-63f6-4284-9de0-cd89bb73adb4-0000 at executor(1)@172.17.0.3:44874
Received SUBSCRIBED event
Subscribed executor on b09aa0a37318
Received LAUNCH event
Starting task b969d109-e0fb-4308-9290-92b22ae8c4b7
/mesos/build/src/mesos-containerizer launch --help="false"
--launch_info="{"command":{"shell":true,"value":"sleep 1000"}}"
--unshare_namespace_mnt="false"
Forked command at 30933
I0111 18:52:55.842718 27082 slave.cpp:3754] Handling status update TASK_RUNNING
(UUID: 6fbee882-8439-4c01-91b5-49b7955349f8) for task
b969d109-e0fb-4308-9290-92b22ae8c4b7 of framework
f193ab26-63f6-4284-9de0-cd89bb73adb4-0000 from executor(1)@172.17.0.3:44874
I0111 18:52:55.843504 27073 status_update_manager.cpp:323] Received status
update TASK_RUNNING (UUID: 6fbee882-8439-4c01-91b5-49b7955349f8) for task
b969d109-e0fb-4308-9290-92b22ae8c4b7 of framework
f193ab26-63f6-4284-9de0-cd89bb73adb4-0000
I0111 18:52:55.843534 27073 status_update_manager.cpp:500] Creating
StatusUpdate stream for task b969d109-e0fb-4308-9290-92b22ae8c4b7 of framework
f193ab26-63f6-4284-9de0-cd89bb73adb4-0000
I0111 18:52:55.843977 27073 status_update_manager.cpp:832] Checkpointing UPDATE
for status update TASK_RUNNING (UUID: 6fbee882-8439-4c01-91b5-49b7955349f8) for
task b969d109-e0fb-4308-9290-92b22ae8c4b7 of framework
f193ab26-63f6-4284-9de0-cd89bb73adb4-0000
I0111 18:52:55.844115 27073 status_update_manager.cpp:377] Forwarding update
TASK_RUNNING (UUID: 6fbee882-8439-4c01-91b5-49b7955349f8) for task
b969d109-e0fb-4308-9290-92b22ae8c4b7 of framework
f193ab26-63f6-4284-9de0-cd89bb73adb4-0000 to the agent
I0111 18:52:55.844358 27073 slave.cpp:4195] Forwarding the update TASK_RUNNING
(UUID: 6fbee882-8439-4c01-91b5-49b7955349f8) for task
b969d109-e0fb-4308-9290-92b22ae8c4b7 of framework
f193ab26-63f6-4284-9de0-cd89bb73adb4-0000 to [email protected]:42061
I0111 18:52:55.844611 27073 slave.cpp:4089] Status update manager successfully
handled status update TASK_RUNNING (UUID: 6fbee882-8439-4c01-91b5-49b7955349f8)
for task b969d109-e0fb-4308-9290-92b22ae8c4b7 of framework
f193ab26-63f6-4284-9de0-cd89bb73adb4-0000
I0111 18:52:55.844825 27073 slave.cpp:4105] Sending acknowledgement for status
update TASK_RUNNING (UUID: 6fbee882-8439-4c01-91b5-49b7955349f8) for task
b969d109-e0fb-4308-9290-92b22ae8c4b7 of framework
f193ab26-63f6-4284-9de0-cd89bb73adb4-0000 to executor(1)@172.17.0.3:44874
I0111 18:52:55.844782 27086 master.cpp:5848] Status update TASK_RUNNING (UUID:
6fbee882-8439-4c01-91b5-49b7955349f8) for task
b969d109-e0fb-4308-9290-92b22ae8c4b7 of framework
f193ab26-63f6-4284-9de0-cd89bb73adb4-0000 from agent
f193ab26-63f6-4284-9de0-cd89bb73adb4-S0 at slave(485)@172.17.0.3:42061
(b09aa0a37318)
I0111 18:52:55.844988 27086 master.cpp:5910] Forwarding status update
TASK_RUNNING (UUID: 6fbee882-8439-4c01-91b5-49b7955349f8) for task
b969d109-e0fb-4308-9290-92b22ae8c4b7 of framework
f193ab26-63f6-4284-9de0-cd89bb73adb4-0000
I0111 18:52:55.845077 27086 master.cpp:7953] Updating the state of task
b969d109-e0fb-4308-9290-92b22ae8c4b7 of framework
f193ab26-63f6-4284-9de0-cd89bb73adb4-0000 (latest state: TASK_RUNNING, status
update state: TASK_RUNNING)
I0111 18:52:55.845196 27086 sched.cpp:1041] Scheduler::statusUpdate took 24432ns
I0111 18:52:55.845324 27086 master.cpp:4950] Processing ACKNOWLEDGE call
6fbee882-8439-4c01-91b5-49b7955349f8 for task
b969d109-e0fb-4308-9290-92b22ae8c4b7 of framework
f193ab26-63f6-4284-9de0-cd89bb73adb4-0000 (default) at
[email protected]:42061 on agent
f193ab26-63f6-4284-9de0-cd89bb73adb4-S0
I0111 18:52:55.845535 27086 status_update_manager.cpp:395] Received status
update acknowledgement (UUID: 6fbee882-8439-4c01-91b5-49b7955349f8) for task
b969d109-e0fb-4308-9290-92b22ae8c4b7 of framework
f193ab26-63f6-4284-9de0-cd89bb73adb4-0000
I0111 18:52:55.845602 27086 status_update_manager.cpp:832] Checkpointing ACK
for status update TASK_RUNNING (UUID: 6fbee882-8439-4c01-91b5-49b7955349f8) for
task b969d109-e0fb-4308-9290-92b22ae8c4b7 of framework
f193ab26-63f6-4284-9de0-cd89bb73adb4-0000
I0111 18:52:55.845835 27086 slave.cpp:3042] Status update manager successfully
handled status update acknowledgement (UUID:
6fbee882-8439-4c01-91b5-49b7955349f8) for task
b969d109-e0fb-4308-9290-92b22ae8c4b7 of framework
f193ab26-63f6-4284-9de0-cd89bb73adb4-0000
I0111 18:52:55.846096 27075 containerizer.cpp:2118] Destroying container
5ca261a9-7655-4473-ad23-4638c856c858 in RUNNING state
I0111 18:52:55.846189 27075 launcher.cpp:149] Asked to destroy container
5ca261a9-7655-4473-ad23-4638c856c858
I0111 18:52:55.853996 27075 slave.cpp:4327] Got exited event for
executor(1)@172.17.0.3:44874
I0111 18:52:55.882633 27079 containerizer.cpp:2481] Container
5ca261a9-7655-4473-ad23-4638c856c858 has exited
I0111 18:52:55.883587 27079 provisioner.cpp:322] Ignoring destroy request for
unknown container 5ca261a9-7655-4473-ad23-4638c856c858
I0111 18:52:55.884263 27080 slave.cpp:4690] Executor
'b969d109-e0fb-4308-9290-92b22ae8c4b7' of framework
f193ab26-63f6-4284-9de0-cd89bb73adb4-0000 terminated with signal Killed
I0111 18:52:55.884340 27080 slave.cpp:3754] Handling status update TASK_FAILED
(UUID: a641ba05-c0cc-4d51-9bc7-5d8dae71afce) for task
b969d109-e0fb-4308-9290-92b22ae8c4b7 of framework
f193ab26-63f6-4284-9de0-cd89bb73adb4-0000 from @0.0.0.0:0
W0111 18:52:55.884745 27083 containerizer.cpp:1933] Ignoring update for unknown
container 5ca261a9-7655-4473-ad23-4638c856c858
I0111 18:52:55.885037 27074 status_update_manager.cpp:323] Received status
update TASK_FAILED (UUID: a641ba05-c0cc-4d51-9bc7-5d8dae71afce) for task
b969d109-e0fb-4308-9290-92b22ae8c4b7 of framework
f193ab26-63f6-4284-9de0-cd89bb73adb4-0000
I0111 18:52:55.885092 27074 status_update_manager.cpp:832] Checkpointing UPDATE
for status update TASK_FAILED (UUID: a641ba05-c0cc-4d51-9bc7-5d8dae71afce) for
task b969d109-e0fb-4308-9290-92b22ae8c4b7 of framework
f193ab26-63f6-4284-9de0-cd89bb73adb4-0000
I0111 18:52:55.885257 27074 status_update_manager.cpp:377] Forwarding update
TASK_FAILED (UUID: a641ba05-c0cc-4d51-9bc7-5d8dae71afce) for task
b969d109-e0fb-4308-9290-92b22ae8c4b7 of framework
f193ab26-63f6-4284-9de0-cd89bb73adb4-0000 to the agent
I0111 18:52:55.885406 27082 slave.cpp:4195] Forwarding the update TASK_FAILED
(UUID: a641ba05-c0cc-4d51-9bc7-5d8dae71afce) for task
b969d109-e0fb-4308-9290-92b22ae8c4b7 of framework
f193ab26-63f6-4284-9de0-cd89bb73adb4-0000 to [email protected]:42061
I0111 18:52:55.885622 27082 slave.cpp:796] Agent terminating
I0111 18:52:55.885812 27074 master.cpp:5848] Status update TASK_FAILED (UUID:
a641ba05-c0cc-4d51-9bc7-5d8dae71afce) for task
b969d109-e0fb-4308-9290-92b22ae8c4b7 of framework
f193ab26-63f6-4284-9de0-cd89bb73adb4-0000 from agent
f193ab26-63f6-4284-9de0-cd89bb73adb4-S0 at slave(485)@172.17.0.3:42061
(b09aa0a37318)
I0111 18:52:55.885972 27074 master.cpp:5910] Forwarding status update
TASK_FAILED (UUID: a641ba05-c0cc-4d51-9bc7-5d8dae71afce) for task
b969d109-e0fb-4308-9290-92b22ae8c4b7 of framework
f193ab26-63f6-4284-9de0-cd89bb73adb4-0000
I0111 18:52:55.886055 27074 master.cpp:7953] Updating the state of task
b969d109-e0fb-4308-9290-92b22ae8c4b7 of framework
f193ab26-63f6-4284-9de0-cd89bb73adb4-0000 (latest state: TASK_FAILED, status
update state: TASK_FAILED)
/mesos/src/tests/slave_recovery_tests.cpp:2799: Failure
Mock function called more times than expected - returning directly.
Function call: statusUpdate(0x7ffc45655d90, @0x2ac116c60810 120-byte object
<50-6B 53-0D C1-2A 00-00 00-00 00-00 00-00 00-00 DF-13 00-00 00-00 00-00 00-55
00-34 C1-2A 00-00 70-23 00-34 C1-2A 00-00 03-00 00-00 01-00 00-00 D0-C1 73-02
00-00 00-00 B0-6D 01-34 C1-2A 00-00 30-4A 02-34 C1-2A 00-00 01-00 00-00 00-2A
00-00 D3-A0 D4-CE 87-2B D6-41 10-12 00-34 C1-2A 00-00 00-00 00-00 00-00 00-00
D0-ED 00-34 C1-2A 00-00 00-00 00-00 00-00 00-00>)
Expected: to be called once
Actual: called twice - over-saturated and active
I0111 18:52:55.886711 27082 hierarchical.cpp:1024] Recovered cpus(*):2;
mem(*):1024; disk(*):1024; ports(*):[31000-32000] (total: cpus(*):2;
mem(*):1024; disk(*):1024; ports(*):[31000-32000], allocated: {}) on agent
f193ab26-63f6-4284-9de0-cd89bb73adb4-S0 from framework
f193ab26-63f6-4284-9de0-cd89bb73adb4-0000
I0111 18:52:55.886860 27074 sched.cpp:1041] Scheduler::statusUpdate took
505379ns
I0111 18:52:55.886862 27085 master.cpp:1261] Agent
f193ab26-63f6-4284-9de0-cd89bb73adb4-S0 at slave(485)@172.17.0.3:42061
(b09aa0a37318) disconnected
I0111 18:52:55.887459 27085 master.cpp:3051] Disconnecting agent
f193ab26-63f6-4284-9de0-cd89bb73adb4-S0 at slave(485)@172.17.0.3:42061
(b09aa0a37318)
I0111 18:52:55.887506 27085 master.cpp:3070] Deactivating agent
f193ab26-63f6-4284-9de0-cd89bb73adb4-S0 at slave(485)@172.17.0.3:42061
(b09aa0a37318)
W0111 18:52:55.887595 27085 master.cpp:4942] Cannot send status update
acknowledgement a641ba05-c0cc-4d51-9bc7-5d8dae71afce for task
b969d109-e0fb-4308-9290-92b22ae8c4b7 of framework
f193ab26-63f6-4284-9de0-cd89bb73adb4-0000 (default) at
[email protected]:42061 to agent
f193ab26-63f6-4284-9de0-cd89bb73adb4-S0 at slave(485)@172.17.0.3:42061
(b09aa0a37318) because agent is disconnected
I0111 18:52:55.887657 27085 hierarchical.cpp:590] Agent
f193ab26-63f6-4284-9de0-cd89bb73adb4-S0 deactivated
I0111 18:52:55.888365 27085 master.cpp:5195] Removing old disconnected agent
f193ab26-63f6-4284-9de0-cd89bb73adb4-S0 at slave(485)@172.17.0.3:42061
(b09aa0a37318) because a registration attempt occurred
I0111 18:52:55.888388 27085 master.cpp:7736] Removing agent
f193ab26-63f6-4284-9de0-cd89bb73adb4-S0 at slave(485)@172.17.0.3:42061
(b09aa0a37318): a new agent registered at the same address
I0111 18:52:55.888535 27085 master.cpp:5234] Registering agent at
slave(485)@172.17.0.3:42061 (b09aa0a37318) with id
f193ab26-63f6-4284-9de0-cd89bb73adb4-S1
I0111 18:52:55.888734 27085 registrar.cpp:461] Applied 1 operations in 30248ns;
attempting to update the registry
I0111 18:52:55.889088 27085 registrar.cpp:506] Successfully updated the
registry in 321024ns
I0111 18:52:55.889161 27085 registrar.cpp:461] Applied 1 operations in 10441ns;
attempting to update the registry
I0111 18:52:55.889709 27085 registrar.cpp:506] Successfully updated the
registry in 393984ns
I0111 18:52:55.889269 27076 master.cpp:7778] Removed agent
f193ab26-63f6-4284-9de0-cd89bb73adb4-S0 at slave(485)@172.17.0.3:42061
(b09aa0a37318): a new agent registered at the same address
I0111 18:52:55.889883 27076 master.cpp:7953] Updating the state of task
b969d109-e0fb-4308-9290-92b22ae8c4b7 of framework
f193ab26-63f6-4284-9de0-cd89bb73adb4-0000 (latest state: TASK_FAILED, status
update state: TASK_LOST)
I0111 18:52:55.889982 27076 master.cpp:8049] Removing task
b969d109-e0fb-4308-9290-92b22ae8c4b7 with resources cpus(*):2; mem(*):1024;
disk(*):1024; ports(*):[31000-32000] of framework
f193ab26-63f6-4284-9de0-cd89bb73adb4-0000 on agent
f193ab26-63f6-4284-9de0-cd89bb73adb4-S0 at slave(485)@172.17.0.3:42061
(b09aa0a37318)
I0111 18:52:55.890242 27076 master.cpp:5905] Sending status update TASK_LOST
for task b969d109-e0fb-4308-9290-92b22ae8c4b7 of framework
f193ab26-63f6-4284-9de0-cd89bb73adb4-0000 'Agent b09aa0a37318 removed: a new
agent registered at the same address'
I0111 18:52:55.890367 27076 master.cpp:2036] Notifying framework
f193ab26-63f6-4284-9de0-cd89bb73adb4-0000 (default) at
[email protected]:42061 of lost agent
f193ab26-63f6-4284-9de0-cd89bb73adb4-S0 (b09aa0a37318)
I0111 18:52:55.890631 27076 master.cpp:5305] Registered agent
f193ab26-63f6-4284-9de0-cd89bb73adb4-S1 at slave(485)@172.17.0.3:42061
(b09aa0a37318) with cpus(*):2; mem(*):1024; :
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)