Vinod Kone created MESOS-3770:
---------------------------------
Summary: SlaveRecoveryTest/0.RecoverCompletedExecutor is flaky
Key: MESOS-3770
URL: https://issues.apache.org/jira/browse/MESOS-3770
Project: Mesos
Issue Type: Bug
Components: test
Affects Versions: 0.25.0
Reporter: Vinod Kone
Observed this in internal CI
{code}
DEBUG: [ RUN ] SlaveRecoveryTest/0.RecoverCompletedExecutor
DEBUG: Using temporary directory
'/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_rTtR9B'
DEBUG: I1020 08:56:36.634321 28115 leveldb.cpp:176] Opened db in 185.662339ms
DEBUG: I1020 08:56:36.701638 28115 leveldb.cpp:183] Compacted db in 67.257643ms
DEBUG: I1020 08:56:36.701705 28115 leveldb.cpp:198] Created db iterator in
8212ns
DEBUG: I1020 08:56:36.701719 28115 leveldb.cpp:204] Seeked to beginning of db
in 1417ns
DEBUG: I1020 08:56:36.701730 28115 leveldb.cpp:273] Iterated through 0 keys in
the db in 357ns
DEBUG: I1020 08:56:36.701756 28115 replica.cpp:746] Replica recovered with log
positions 0 -> 0 with 1 holes and 0 unlearned
DEBUG: I1020 08:56:36.702062 28132 recover.cpp:449] Starting replica recovery
DEBUG: I1020 08:56:36.702116 28132 recover.cpp:475] Replica is in EMPTY status
DEBUG: I1020 08:56:36.702952 28132 replica.cpp:642] Replica in EMPTY status
received a broadcasted recover request from (7143)@172.16.132.117:37586
DEBUG: I1020 08:56:36.703795 28141 recover.cpp:195] Received a recover response
from a replica in EMPTY status
DEBUG: I1020 08:56:36.704100 28138 recover.cpp:566] Updating replica status to
STARTING
DEBUG: I1020 08:56:36.705229 28133 master.cpp:376] Master
0d54e2f1-43d7-4f74-8532-9c37ac40791b (smfc-ahy-19-sr2.corpdc.twitter.com)
started on 172.16.132.117:37586
DEBUG: I1020 08:56:36.705289 28133 master.cpp:378] Flags at startup: --acls=""
--allocation_interval="1secs" --allocator="HierarchicalDRF"
--authenticate="true" --authenticate_slaves="true" --authenticators="crammd5"
--authorizers="local"
--credentials="/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_rTtR9B/credentials"
--framework_sorter="drf" --help="false" --hostname_lookup="true"
--initialize_driver_logging="true" --log_auto_initialize="true"
--logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5"
--quiet="false" --recovery_slave_removal_limit="100%"
--registry="replicated_log" --registry_fetch_timeout="1mins"
--registry_store_timeout="25secs" --registry_strict="true"
--root_submissions="true" --slave_ping_timeout="15secs"
--slave_reregister_timeout="10mins" --user_sorter="drf" --version="false"
--webui_dir="/usr/local/share/mesos/webui"
--work_dir="/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_rTtR9B/master"
--zk_session_timeout="10secs"
DEBUG: I1020 08:56:36.705440 28133 master.cpp:423] Master only allowing
authenticated frameworks to register
DEBUG: I1020 08:56:36.705446 28133 master.cpp:428] Master only allowing
authenticated slaves to register
DEBUG: I1020 08:56:36.705451 28133 credentials.hpp:37] Loading credentials for
authentication from
'/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_rTtR9B/credentials'
DEBUG: I1020 08:56:36.705587 28133 master.cpp:467] Using default 'crammd5'
authenticator
DEBUG: I1020 08:56:36.705651 28133 master.cpp:504] Authorization enabled
DEBUG: I1020 08:56:36.706521 28134 master.cpp:1609] The newly elected leader is
[email protected]:37586 with id 0d54e2f1-43d7-4f74-8532-9c37ac40791b
DEBUG: I1020 08:56:36.706539 28134 master.cpp:1622] Elected as the leading
master!
DEBUG: I1020 08:56:36.706545 28134 master.cpp:1382] Recovering from registrar
DEBUG: I1020 08:56:36.706681 28146 registrar.cpp:309] Recovering registrar
DEBUG: I1020 08:56:36.768453 28138 leveldb.cpp:306] Persisting metadata (8
bytes) to leveldb took 64.300669ms
DEBUG: I1020 08:56:36.768492 28138 replica.cpp:323] Persisted replica status to
STARTING
DEBUG: I1020 08:56:36.768568 28138 recover.cpp:475] Replica is in STARTING
status
DEBUG: I1020 08:56:36.769737 28131 replica.cpp:642] Replica in STARTING status
received a broadcasted recover request from (7144)@172.16.132.117:37586
DEBUG: I1020 08:56:36.769816 28131 recover.cpp:195] Received a recover response
from a replica in STARTING status
DEBUG: I1020 08:56:36.770355 28141 recover.cpp:566] Updating replica status to
VOTING
DEBUG: I1020 08:56:36.818709 28136 leveldb.cpp:306] Persisting metadata (8
bytes) to leveldb took 48.054479ms
DEBUG: I1020 08:56:36.818743 28136 replica.cpp:323] Persisted replica status to
VOTING
DEBUG: I1020 08:56:36.818791 28136 recover.cpp:580] Successfully joined the
Paxos group
DEBUG: I1020 08:56:36.818842 28136 recover.cpp:464] Recover process terminated
DEBUG: I1020 08:56:36.818954 28130 log.cpp:661] Attempting to start the writer
DEBUG: I1020 08:56:36.820060 28140 replica.cpp:478] Replica received implicit
promise request from (7145)@172.16.132.117:37586 with proposal 1
DEBUG: I1020 08:56:36.885488 28140 leveldb.cpp:306] Persisting metadata (8
bytes) to leveldb took 65.391754ms
DEBUG: I1020 08:56:36.885524 28140 replica.cpp:345] Persisted promised to 1
DEBUG: I1020 08:56:36.885898 28148 coordinator.cpp:231] Coordinator attemping
to fill missing position
DEBUG: I1020 08:56:36.888197 28143 replica.cpp:378] Replica received explicit
promise request from (7146)@172.16.132.117:37586 for position 0 with proposal 2
DEBUG: I1020 08:56:36.952364 28143 leveldb.cpp:343] Persisting action (8 bytes)
to leveldb took 64.11496ms
DEBUG: I1020 08:56:36.952407 28143 replica.cpp:681] Persisted action at 0
DEBUG: I1020 08:56:36.953027 28139 replica.cpp:512] Replica received write
request for position 0 from (7147)@172.16.132.117:37586
DEBUG: I1020 08:56:36.953059 28139 leveldb.cpp:438] Reading position from
leveldb took 14203ns
DEBUG: I1020 08:56:36.994161 28139 leveldb.cpp:343] Persisting action (14
bytes) to leveldb took 41.082908ms
DEBUG: I1020 08:56:36.994199 28139 replica.cpp:681] Persisted action at 0
DEBUG: I1020 08:56:36.994415 28143 replica.cpp:660] Replica received learned
notice for position 0 from @0.0.0.0:0
DEBUG: I1020 08:56:37.036006 28143 leveldb.cpp:343] Persisting action (16
bytes) to leveldb took 41.569884ms
DEBUG: I1020 08:56:37.036043 28143 replica.cpp:681] Persisted action at 0
DEBUG: I1020 08:56:37.036056 28143 replica.cpp:666] Replica learned NOP action
at position 0
DEBUG: I1020 08:56:37.036365 28138 log.cpp:677] Writer started with ending
position 0
DEBUG: I1020 08:56:37.037250 28133 leveldb.cpp:438] Reading position from
leveldb took 45165ns
DEBUG: I1020 08:56:37.038321 28131 registrar.cpp:342] Successfully fetched the
registry (0B) in 331.61216ms
DEBUG: I1020 08:56:37.038378 28131 registrar.cpp:441] Applied 1 operations in
6704ns; attempting to update the 'registry'
DEBUG: I1020 08:56:37.038996 28132 log.cpp:685] Attempting to append 228 bytes
to the log
DEBUG: I1020 08:56:37.039103 28132 coordinator.cpp:341] Coordinator attempting
to write APPEND action at position 1
DEBUG: I1020 08:56:37.039717 28152 replica.cpp:512] Replica received write
request for position 1 from (7148)@172.16.132.117:37586
DEBUG: I1020 08:56:37.111176 28152 leveldb.cpp:343] Persisting action (247
bytes) to leveldb took 71.428059ms
DEBUG: I1020 08:56:37.111225 28152 replica.cpp:681] Persisted action at 1
DEBUG: I1020 08:56:37.111547 28138 replica.cpp:660] Replica received learned
notice for position 1 from @0.0.0.0:0
DEBUG: I1020 08:56:37.177995 28138 leveldb.cpp:343] Persisting action (249
bytes) to leveldb took 66.426857ms
DEBUG: I1020 08:56:37.178033 28138 replica.cpp:681] Persisted action at 1
DEBUG: I1020 08:56:37.178045 28138 replica.cpp:666] Replica learned APPEND
action at position 1
DEBUG: I1020 08:56:37.178407 28137 registrar.cpp:486] Successfully updated the
'registry' in 139.99104ms
DEBUG: I1020 08:56:37.178473 28137 registrar.cpp:372] Successfully recovered
registrar
DEBUG: I1020 08:56:37.178496 28146 log.cpp:704] Attempting to truncate the log
to 1
DEBUG: I1020 08:56:37.178966 28145 master.cpp:1419] Recovered 0 slaves from the
Registry (189B) ; allowing 10mins for slaves to re-register
DEBUG: I1020 08:56:37.178951 28140 coordinator.cpp:341] Coordinator attempting
to write TRUNCATE action at position 2
DEBUG: I1020 08:56:37.180043 28132 replica.cpp:512] Replica received write
request for position 2 from (7149)@172.16.132.117:37586
DEBUG: I1020 08:56:37.219796 28132 leveldb.cpp:343] Persisting action (16
bytes) to leveldb took 39.72368ms
DEBUG: I1020 08:56:37.219825 28132 replica.cpp:681] Persisted action at 2
DEBUG: I1020 08:56:37.219990 28132 replica.cpp:660] Replica received learned
notice for position 2 from @0.0.0.0:0
DEBUG: I1020 08:56:37.278291 28132 leveldb.cpp:343] Persisting action (18
bytes) to leveldb took 58.280037ms
DEBUG: I1020 08:56:37.278342 28132 leveldb.cpp:401] Deleting ~1 keys from
leveldb took 21056ns
DEBUG: I1020 08:56:37.278355 28132 replica.cpp:681] Persisted action at 2
DEBUG: I1020 08:56:37.278363 28132 replica.cpp:666] Replica learned TRUNCATE
action at position 2
DEBUG: I1020 08:56:37.281244 28115 containerizer.cpp:143] Using isolation:
posix/cpu,posix/mem,filesystem/posix
DEBUG: W1020 08:56:37.281390 28115 backend.cpp:50] Failed to create 'bind'
backend: BindBackend requires root privileges
DEBUG: I1020 08:56:37.284584 28153 slave.cpp:191] Slave started on
255)@172.16.132.117:37586
DEBUG: I1020 08:56:37.284600 28153 slave.cpp:192] Flags at startup:
--appc_store_dir="/tmp/mesos/store/appc" --authenticatee="crammd5"
--cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false"
--cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false"
--cgroups_root="mesos" --container_disk_watch_interval="15secs"
--containerizers="mesos"
--credential="/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_WO4xyx/credential"
--default_role="*" --disk_watch_interval="1mins" --docker="docker"
--docker_kill_orphans="true"
--docker_local_archives_dir="/tmp/mesos/images/docker" --docker_puller="local"
--docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock"
--docker_stop_timeout="0ns" --docker_store_dir="/tmp/mesos/store/docker"
--egress_unique_flow_per_container="false"
--enforce_container_disk_quota="false" --ephemeral_ports_per_container="1024"
--executor_registration_timeout="1mins"
--executor_shutdown_grace_period="5secs"
--fetcher_cache_dir="/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_WO4xyx/fetch"
--fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks"
--gc_disk_headroom="0.1" --hadoop_home="" --help="false"
--hostname_lookup="true" --image_provisioner_backend="copy"
--initialize_driver_logging="true" --isolation="posix/cpu,posix/mem"
--launcher_dir="/builddir/build/BUILD/mesos-0.26.0/src" --logbufsecs="0"
--logging_level="INFO" --network_enable_socket_statistics_details="false"
--network_enable_socket_statistics_summary="false"
--oversubscribed_resources_interval="15secs" --perf_duration="10secs"
--perf_interval="1mins" --qos_correction_interval_min="0ns" --quiet="false"
--recover="reconnect" --recovery_timeout="15mins"
--registration_backoff_factor="10ms"
--resources="cpus:2;mem:1024;disk:1024;ports:[31000-32000]"
--revocable_cpu_low_priority="true" --sandbox_directory="/mnt/mesos/sandbox"
--strict="true" --switch_user="true"
--systemd_runtime_directory="/run/systemd/system" --version="false"
--work_dir="/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_WO4xyx"
DEBUG: I1020 08:56:37.284816 28153 credentials.hpp:85] Loading credential for
authentication from
'/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_WO4xyx/credential'
DEBUG: I1020 08:56:37.284910 28153 slave.cpp:322] Slave using credential for:
test-principal
DEBUG: I1020 08:56:37.285879 28153 slave.cpp:392] Slave resources: cpus(*):2;
mem(*):1024; disk(*):1024; ports(*):[31000-32000]
DEBUG: I1020 08:56:37.285928 28153 slave.cpp:400] Slave attributes: [ ]
DEBUG: I1020 08:56:37.285943 28153 slave.cpp:405] Slave hostname:
smfc-ahy-19-sr2.corpdc.twitter.com
DEBUG: I1020 08:56:37.285955 28153 slave.cpp:410] Slave checkpoint: true
DEBUG: I1020 08:56:37.286746 28115 sched.cpp:166] Version: 0.26.0-rc0
DEBUG: I1020 08:56:37.286973 28151 state.cpp:54] Recovering state from
'/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_WO4xyx/meta'
DEBUG: I1020 08:56:37.287063 28143 status_update_manager.cpp:202] Recovering
status update manager
DEBUG: I1020 08:56:37.287384 28137 containerizer.cpp:385] Recovering
containerizer
DEBUG: I1020 08:56:37.287608 28139 sched.cpp:264] New master detected at
[email protected]:37586
DEBUG: I1020 08:56:37.287746 28139 sched.cpp:320] Authenticating with master
[email protected]:37586
DEBUG: I1020 08:56:37.287757 28139 sched.cpp:327] Using default CRAM-MD5
authenticatee
DEBUG: I1020 08:56:37.287824 28135 slave.cpp:4222] Finished recovery
DEBUG: I1020 08:56:37.287967 28149 authenticatee.cpp:115] Creating new client
SASL connection
DEBUG: I1020 08:56:37.288192 28139 master.cpp:5144] Authenticating
[email protected]:37586
DEBUG: I1020 08:56:37.288382 28136 status_update_manager.cpp:176] Pausing
sending status updates
DEBUG: I1020 08:56:37.288444 28140 slave.cpp:726] New master detected at
[email protected]:37586
DEBUG: I1020 08:56:37.288467 28140 slave.cpp:789] Authenticating with master
[email protected]:37586
DEBUG: I1020 08:56:37.288475 28140 slave.cpp:794] Using default CRAM-MD5
authenticatee
DEBUG: I1020 08:56:37.288508 28140 slave.cpp:762] Detecting new master
DEBUG: I1020 08:56:37.288604 28130 authenticatee.cpp:115] Creating new client
SASL connection
DEBUG: I1020 08:56:37.288717 28140 authenticator.cpp:92] Creating new server
SASL connection
DEBUG: I1020 08:56:37.288785 28139 master.cpp:5144] Authenticating
slave(255)@172.16.132.117:37586
DEBUG: I1020 08:56:37.288861 28135 authenticatee.cpp:206] Received SASL
authentication mechanisms: CRAM-MD5
DEBUG: I1020 08:56:37.288882 28135 authenticatee.cpp:232] Attempting to
authenticate with mechanism 'CRAM-MD5'
DEBUG: I1020 08:56:37.289080 28133 authenticator.cpp:92] Creating new server
SASL connection
DEBUG: I1020 08:56:37.289175 28131 authenticatee.cpp:206] Received SASL
authentication mechanisms: CRAM-MD5
DEBUG: I1020 08:56:37.289199 28131 authenticatee.cpp:232] Attempting to
authenticate with mechanism 'CRAM-MD5'
DEBUG: I1020 08:56:37.289355 28144 authenticator.cpp:197] Received SASL
authentication start
DEBUG: I1020 08:56:37.289371 28140 authenticator.cpp:197] Received SASL
authentication start
DEBUG: I1020 08:56:37.289440 28144 authenticator.cpp:319] Authentication
requires more steps
DEBUG: I1020 08:56:37.289459 28140 authenticator.cpp:319] Authentication
requires more steps
DEBUG: I1020 08:56:37.289480 28144 authenticatee.cpp:252] Received SASL
authentication step
DEBUG: I1020 08:56:37.289489 28140 authenticatee.cpp:252] Received SASL
authentication step
DEBUG: I1020 08:56:37.289813 28132 authenticator.cpp:225] Received SASL
authentication step
DEBUG: I1020 08:56:37.289852 28132 authenticator.cpp:311] Authentication success
DEBUG: I1020 08:56:37.289898 28143 authenticatee.cpp:292] Authentication success
DEBUG: I1020 08:56:37.289971 28142 master.cpp:5174] Successfully authenticated
principal 'test-principal' at
[email protected]:37586
DEBUG: I1020 08:56:37.290009 28143 sched.cpp:409] Successfully authenticated
with master [email protected]:37586
DEBUG: I1020 08:56:37.290163 28137 authenticator.cpp:225] Received SASL
authentication step
DEBUG: I1020 08:56:37.290220 28137 authenticator.cpp:311] Authentication success
DEBUG: I1020 08:56:37.290266 28134 master.cpp:2185] Received SUBSCRIBE call for
framework 'default' at
[email protected]:37586
DEBUG: I1020 08:56:37.290289 28134 master.cpp:1648] Authorizing framework
principal 'test-principal' to receive offers for role '*'
DEBUG: I1020 08:56:37.290405 28153 authenticatee.cpp:292] Authentication success
DEBUG: I1020 08:56:37.290431 28144 master.cpp:5174] Successfully authenticated
principal 'test-principal' at slave(255)@172.16.132.117:37586
DEBUG: I1020 08:56:37.290464 28144 master.cpp:2256] Subscribing framework
default with checkpointing enabled and capabilities [ ]
DEBUG: I1020 08:56:37.290654 28136 hierarchical.cpp:185] Added framework
0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000
DEBUG: I1020 08:56:37.290650 28148 slave.cpp:857] Successfully authenticated
with master [email protected]:37586
DEBUG: I1020 08:56:37.290817 28144 master.cpp:3868] Registering slave at
slave(255)@172.16.132.117:37586 (smfc-ahy-19-sr2.corpdc.twitter.com) with id
0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0
DEBUG: I1020 08:56:37.290873 28151 sched.cpp:643] Framework registered with
0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000
DEBUG: I1020 08:56:37.290930 28130 registrar.cpp:441] Applied 1 operations in
16086ns; attempting to update the 'registry'
DEBUG: I1020 08:56:37.291250 28151 log.cpp:685] Attempting to append 419 bytes
to the log
DEBUG: I1020 08:56:37.291321 28139 coordinator.cpp:341] Coordinator attempting
to write APPEND action at position 3
DEBUG: I1020 08:56:37.291939 28131 replica.cpp:512] Replica received write
request for position 3 from (7161)@172.16.132.117:37586
DEBUG: I1020 08:56:37.300041 28136 master.cpp:3856] Ignoring register slave
message from slave(255)@172.16.132.117:37586
(smfc-ahy-19-sr2.corpdc.twitter.com) as admission is already in progress
DEBUG: I1020 08:56:37.311420 28141 master.cpp:3856] Ignoring register slave
message from slave(255)@172.16.132.117:37586
(smfc-ahy-19-sr2.corpdc.twitter.com) as admission is already in progress
DEBUG: I1020 08:56:37.341991 28130 master.cpp:3856] Ignoring register slave
message from slave(255)@172.16.132.117:37586
(smfc-ahy-19-sr2.corpdc.twitter.com) as admission is already in progress
DEBUG: I1020 08:56:37.354004 28131 leveldb.cpp:343] Persisting action (438
bytes) to leveldb took 62.03948ms
DEBUG: I1020 08:56:37.354042 28131 replica.cpp:681] Persisted action at 3
DEBUG: I1020 08:56:37.354390 28136 replica.cpp:660] Replica received learned
notice for position 3 from @0.0.0.0:0
DEBUG: I1020 08:56:37.378435 28130 master.cpp:3856] Ignoring register slave
message from slave(255)@172.16.132.117:37586
(smfc-ahy-19-sr2.corpdc.twitter.com) as admission is already in progress
DEBUG: I1020 08:56:37.395829 28136 leveldb.cpp:343] Persisting action (440
bytes) to leveldb took 41.417781ms
DEBUG: I1020 08:56:37.395867 28136 replica.cpp:681] Persisted action at 3
DEBUG: I1020 08:56:37.395879 28136 replica.cpp:666] Replica learned APPEND
action at position 3
DEBUG: I1020 08:56:37.396229 28136 registrar.cpp:486] Successfully updated the
'registry' in 105.257984ms
DEBUG: I1020 08:56:37.396466 28148 log.cpp:704] Attempting to truncate the log
to 3
DEBUG: I1020 08:56:37.396519 28151 coordinator.cpp:341] Coordinator attempting
to write TRUNCATE action at position 4
DEBUG: I1020 08:56:37.396827 28136 master.cpp:3936] Registered slave
0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0 at slave(255)@172.16.132.117:37586
(smfc-ahy-19-sr2.corpdc.twitter.com) with cpus(*):2; mem(*):1024; disk(*):1024;
ports(*):[31000-32000]
DEBUG: I1020 08:56:37.396867 28137 slave.cpp:901] Registered with master
[email protected]:37586; given slave ID
0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0
DEBUG: I1020 08:56:37.396874 28152 hierarchical.cpp:335] Added slave
0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0 (smfc-ahy-19-sr2.corpdc.twitter.com)
with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] (allocated: )
DEBUG: I1020 08:56:37.397186 28150 master.cpp:4973] Sending 1 offers to
framework 0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 (default) at
[email protected]:37586
DEBUG: I1020 08:56:37.397260 28132 status_update_manager.cpp:183] Resuming
sending status updates
DEBUG: I1020 08:56:37.397285 28136 replica.cpp:512] Replica received write
request for position 4 from (7162)@172.16.132.117:37586
DEBUG: I1020 08:56:37.397516 28137 slave.cpp:960] Forwarding total
oversubscribed resources
DEBUG: I1020 08:56:37.397769 28150 master.cpp:4278] Received update of slave
0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0 at slave(255)@172.16.132.117:37586
(smfc-ahy-19-sr2.corpdc.twitter.com) with total oversubscribed resources
DEBUG: I1020 08:56:37.397902 28149 hierarchical.cpp:391] Slave
0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0 (smfc-ahy-19-sr2.corpdc.twitter.com)
updated with oversubscribed resources (total: cpus(*):2; mem(*):1024;
disk(*):1024; ports(*):[31000-32000], allocated: cpus(*):2; mem(*):1024;
disk(*):1024; ports(*):[31000-32000])
DEBUG: I1020 08:56:37.399044 28152 master.cpp:2924] Processing ACCEPT call for
offers: [ 0d54e2f1-43d7-4f74-8532-9c37ac40791b-O0 ] on slave
0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0 at slave(255)@172.16.132.117:37586
(smfc-ahy-19-sr2.corpdc.twitter.com) for framework
0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 (default) at
[email protected]:37586
DEBUG: I1020 08:56:37.399066 28152 master.cpp:2720] Authorizing framework
principal 'test-principal' to launch task 0bbb69a3-3ba3-4016-86ed-4d80c1bf594b
as user 'mockbuild'
DEBUG: I1020 08:56:37.399948 28147 master.hpp:176] Adding task
0bbb69a3-3ba3-4016-86ed-4d80c1bf594b with resources cpus(*):2; mem(*):1024;
disk(*):1024; ports(*):[31000-32000] on slave
0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0 (smfc-ahy-19-sr2.corpdc.twitter.com)
DEBUG: I1020 08:56:37.399991 28147 master.cpp:3254] Launching task
0bbb69a3-3ba3-4016-86ed-4d80c1bf594b of framework
0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 (default) at
[email protected]:37586 with
resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave
0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0 at slave(255)@172.16.132.117:37586
(smfc-ahy-19-sr2.corpdc.twitter.com)
DEBUG: I1020 08:56:37.400190 28147 slave.cpp:1291] Got assigned task
0bbb69a3-3ba3-4016-86ed-4d80c1bf594b for framework
0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000
DEBUG: I1020 08:56:37.409716 28147 slave.cpp:1407] Launching task
0bbb69a3-3ba3-4016-86ed-4d80c1bf594b for framework
0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000
DEBUG: I1020 08:56:37.410130 28147 paths.cpp:417] Trying to chown
'/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_WO4xyx/slaves/0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0/frameworks/0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000/executors/0bbb69a3-3ba3-4016-86ed-4d80c1bf594b/runs/2b1371cc-4d78-44d1-b1f1-e1b29127adc8'
to user 'mockbuild'
DEBUG: I1020 08:56:37.418064 28147 slave.cpp:4994] Launching executor
0bbb69a3-3ba3-4016-86ed-4d80c1bf594b of framework
0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 with resources cpus(*):0.1; mem(*):32
in work directory
'/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_WO4xyx/slaves/0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0/frameworks/0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000/executors/0bbb69a3-3ba3-4016-86ed-4d80c1bf594b/runs/2b1371cc-4d78-44d1-b1f1-e1b29127adc8'
DEBUG: I1020 08:56:37.418360 28137 containerizer.cpp:639] Starting container
'2b1371cc-4d78-44d1-b1f1-e1b29127adc8' for executor
'0bbb69a3-3ba3-4016-86ed-4d80c1bf594b' of framework
'0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000'
DEBUG: I1020 08:56:37.418512 28147 slave.cpp:1625] Queuing task
'0bbb69a3-3ba3-4016-86ed-4d80c1bf594b' for executor
0bbb69a3-3ba3-4016-86ed-4d80c1bf594b of framework
'0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000
DEBUG: I1020 08:56:37.421762 28143 launcher.cpp:132] Forked child with pid
'40438' for container '2b1371cc-4d78-44d1-b1f1-e1b29127adc8'
DEBUG: I1020 08:56:37.421979 28143 containerizer.cpp:872] Checkpointing
executor's forked pid 40438 to
'/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_WO4xyx/meta/slaves/0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0/frameworks/0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000/executors/0bbb69a3-3ba3-4016-86ed-4d80c1bf594b/runs/2b1371cc-4d78-44d1-b1f1-e1b29127adc8/pids/forked.pid'
DEBUG: I1020 08:56:37.463163 28136 leveldb.cpp:343] Persisting action (16
bytes) to leveldb took 65.83211ms
DEBUG: I1020 08:56:37.463227 28136 replica.cpp:681] Persisted action at 4
DEBUG: I1020 08:56:37.464862 28136 replica.cpp:660] Replica received learned
notice for position 4 from @0.0.0.0:0
DEBUG: I1020 08:56:37.504696 40438 exec.cpp:136] Version: 0.26.0-rc0
DEBUG: I1020 08:56:37.506574 28150 slave.cpp:2415] Got registration for
executor '0bbb69a3-3ba3-4016-86ed-4d80c1bf594b' of framework
0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 from executor(1)@172.16.132.117:41415
DEBUG: I1020 08:56:37.575783 40482 exec.cpp:210] Executor registered on slave
0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0
DEBUG: I1020 08:56:37.576037 28147 slave.cpp:1796] Sending queued task
'0bbb69a3-3ba3-4016-86ed-4d80c1bf594b' to executor
'0bbb69a3-3ba3-4016-86ed-4d80c1bf594b' of framework
0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000
DEBUG: Registered executor on smfc-ahy-19-sr2.corpdc.twitter.com
DEBUG: Starting task 0bbb69a3-3ba3-4016-86ed-4d80c1bf594b
DEBUG: sh -c 'exit 0'
DEBUG: Forked command at 40500
DEBUG: I1020 08:56:37.580664 28139 slave.cpp:2778] Handling status update
TASK_RUNNING (UUID: fb1a8b73-5136-4a8f-ae92-9d26b13fb41b) for task
0bbb69a3-3ba3-4016-86ed-4d80c1bf594b of framework
0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 from executor(1)@172.16.132.117:41415
DEBUG: I1020 08:56:37.580994 28146 status_update_manager.cpp:322] Received
status update TASK_RUNNING (UUID: fb1a8b73-5136-4a8f-ae92-9d26b13fb41b) for
task 0bbb69a3-3ba3-4016-86ed-4d80c1bf594b of framework
0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000
DEBUG: I1020 08:56:37.581665 28146 status_update_manager.cpp:826] Checkpointing
UPDATE for status update TASK_RUNNING (UUID:
fb1a8b73-5136-4a8f-ae92-9d26b13fb41b) for task
0bbb69a3-3ba3-4016-86ed-4d80c1bf594b of framework
0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000
DEBUG: I1020 08:56:37.623446 28136 leveldb.cpp:343] Persisting action (18
bytes) to leveldb took 158.550858ms
DEBUG: Command exited with status 0 (pid: 40500)
DEBUG: I1020 08:56:37.623538 28136 leveldb.cpp:401] Deleting ~2 keys from
leveldb took 41621ns
DEBUG: I1020 08:56:37.623558 28136 replica.cpp:681] Persisted action at 4
DEBUG: I1020 08:56:37.623571 28136 replica.cpp:666] Replica learned TRUNCATE
action at position 4
DEBUG: I1020 08:56:37.681676 28145 slave.cpp:2778] Handling status update
TASK_FINISHED (UUID: 4a38e137-c590-48f6-ba10-2e0ba4903ac5) for task
0bbb69a3-3ba3-4016-86ed-4d80c1bf594b of framework
0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 from executor(1)@172.16.132.117:41415
DEBUG: I1020 08:56:37.840840 28146 status_update_manager.cpp:322] Received
status update TASK_FINISHED (UUID: 4a38e137-c590-48f6-ba10-2e0ba4903ac5) for
task 0bbb69a3-3ba3-4016-86ed-4d80c1bf594b of framework
0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000
DEBUG: I1020 08:56:37.840920 28146 status_update_manager.cpp:826] Checkpointing
UPDATE for status update TASK_FINISHED (UUID:
4a38e137-c590-48f6-ba10-2e0ba4903ac5) for task
0bbb69a3-3ba3-4016-86ed-4d80c1bf594b of framework
0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000
DEBUG: I1020 08:56:37.840901 28148 slave.cpp:3090] Forwarding the update
TASK_RUNNING (UUID: fb1a8b73-5136-4a8f-ae92-9d26b13fb41b) for task
0bbb69a3-3ba3-4016-86ed-4d80c1bf594b of framework
0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 to [email protected]:37586
DEBUG: I1020 08:56:37.841084 28148 slave.cpp:3020] Sending acknowledgement for
status update TASK_RUNNING (UUID: fb1a8b73-5136-4a8f-ae92-9d26b13fb41b) for
task 0bbb69a3-3ba3-4016-86ed-4d80c1bf594b of framework
0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 to executor(1)@172.16.132.117:41415
DEBUG: I1020 08:56:37.841506 28153 master.cpp:4421] Status update TASK_RUNNING
(UUID: fb1a8b73-5136-4a8f-ae92-9d26b13fb41b) for task
0bbb69a3-3ba3-4016-86ed-4d80c1bf594b of framework
0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 from slave
0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0 at slave(255)@172.16.132.117:37586
(smfc-ahy-19-sr2.corpdc.twitter.com)
DEBUG: I1020 08:56:37.841531 28153 master.cpp:4460] Forwarding status update
TASK_RUNNING (UUID: fb1a8b73-5136-4a8f-ae92-9d26b13fb41b) for task
0bbb69a3-3ba3-4016-86ed-4d80c1bf594b of framework
0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000
DEBUG: I1020 08:56:37.841815 28153 master.cpp:6086] Updating the latest state
of task 0bbb69a3-3ba3-4016-86ed-4d80c1bf594b of framework
0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 to TASK_FINISHED (status update
state: TASK_RUNNING)
DEBUG: I1020 08:56:37.842349 28131 hierarchical.cpp:739] Recovered cpus(*):2;
mem(*):1024; disk(*):1024; ports(*):[31000-32000] (total: cpus(*):2;
mem(*):1024; disk(*):1024; ports(*):[31000-32000], allocated: ) on slave
0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0 from framework
0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000
DEBUG: I1020 08:56:37.843065 28152 master.cpp:3612] Processing ACKNOWLEDGE call
fb1a8b73-5136-4a8f-ae92-9d26b13fb41b for task
0bbb69a3-3ba3-4016-86ed-4d80c1bf594b of framework
0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 (default) at
[email protected]:37586 on slave
0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0
DEBUG: I1020 08:56:37.932744 28146 status_update_manager.cpp:394] Received
status update acknowledgement (UUID: fb1a8b73-5136-4a8f-ae92-9d26b13fb41b) for
task 0bbb69a3-3ba3-4016-86ed-4d80c1bf594b of framework
0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000
DEBUG: I1020 08:56:37.932793 28146 status_update_manager.cpp:826] Checkpointing
ACK for status update TASK_RUNNING (UUID: fb1a8b73-5136-4a8f-ae92-9d26b13fb41b)
for task 0bbb69a3-3ba3-4016-86ed-4d80c1bf594b of framework
0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000
DEBUG: I1020 08:56:37.932839 28152 slave.cpp:3020] Sending acknowledgement for
status update TASK_FINISHED (UUID: 4a38e137-c590-48f6-ba10-2e0ba4903ac5) for
task 0bbb69a3-3ba3-4016-86ed-4d80c1bf594b of framework
0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 to executor(1)@172.16.132.117:41415
DEBUG: 2015-10-20
08:56:37,948:28115(0x7f526963c700):ZOO_ERROR@handle_socket_error_msg@1697:
Socket [127.0.0.1:34917] zk retcode=-4, errno=111(Connection refused): server
refused to accept the client
DEBUG: I1020 08:56:38.091502 28144 slave.cpp:3090] Forwarding the update
TASK_FINISHED (UUID: 4a38e137-c590-48f6-ba10-2e0ba4903ac5) for task
0bbb69a3-3ba3-4016-86ed-4d80c1bf594b of framework
0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 to [email protected]:37586
DEBUG: I1020 08:56:38.091954 28152 master.cpp:4421] Status update TASK_FINISHED
(UUID: 4a38e137-c590-48f6-ba10-2e0ba4903ac5) for task
0bbb69a3-3ba3-4016-86ed-4d80c1bf594b of framework
0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 from slave
0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0 at slave(255)@172.16.132.117:37586
(smfc-ahy-19-sr2.corpdc.twitter.com)
DEBUG: I1020 08:56:38.092000 28152 master.cpp:4460] Forwarding status update
TASK_FINISHED (UUID: 4a38e137-c590-48f6-ba10-2e0ba4903ac5) for task
0bbb69a3-3ba3-4016-86ed-4d80c1bf594b of framework
0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000
DEBUG: E1020 08:56:38.092056 28152 master.cpp:6032] Ignoring status update for
the terminated task 0bbb69a3-3ba3-4016-86ed-4d80c1bf594b (TASK_FINISHED ->
TASK_FINISHED) of framework 0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000
DEBUG: I1020 08:56:38.092725 28153 master.cpp:3612] Processing ACKNOWLEDGE call
4a38e137-c590-48f6-ba10-2e0ba4903ac5 for task
0bbb69a3-3ba3-4016-86ed-4d80c1bf594b of framework
0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 (default) at
[email protected]:37586 on slave
0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0
DEBUG: I1020 08:56:38.092818 28153 status_update_manager.cpp:394] Received
status update acknowledgement (UUID: 4a38e137-c590-48f6-ba10-2e0ba4903ac5) for
task 0bbb69a3-3ba3-4016-86ed-4d80c1bf594b of framework
0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000
DEBUG: I1020 08:56:38.092852 28153 status_update_manager.cpp:826] Checkpointing
ACK for status update TASK_FINISHED (UUID:
4a38e137-c590-48f6-ba10-2e0ba4903ac5) for task
0bbb69a3-3ba3-4016-86ed-4d80c1bf594b of framework
0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000
DEBUG: I1020 08:56:38.691695 28137 containerizer.cpp:1278] Executor for
container '2b1371cc-4d78-44d1-b1f1-e1b29127adc8' has exited
DEBUG: I1020 08:56:38.691723 28137 containerizer.cpp:1095] Destroying container
'2b1371cc-4d78-44d1-b1f1-e1b29127adc8'
DEBUG: I1020 08:56:38.708005 28138 master.cpp:4973] Sending 1 offers to
framework 0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 (default) at
[email protected]:37586
DEBUG: I1020 08:56:38.715711 28141 slave.cpp:3542] Executor
'0bbb69a3-3ba3-4016-86ed-4d80c1bf594b' of framework
0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 exited with status 0
DEBUG: I1020 08:56:38.715741 28141 slave.cpp:3646] Cleaning up executor
'0bbb69a3-3ba3-4016-86ed-4d80c1bf594b' of framework
0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000
DEBUG: tests/slave_recovery_tests.cpp:881: Failure
DEBUG: Mock function called more times than expected - returning directly.
DEBUG: Function call: statusUpdate(0x7fffabf3b0c0, @0x7f52f0b9d740 112-byte
object <F0-C1 40-FC 52-7F 00-00 00-00 00-00 00-00 00-00 B0-24 01-AC 52-7F 00-00
00-95 01-AC 52-7F 00-00 05-00 00-00 01-00 00-00 80-28 40-01 00-00 00-00 40-AF
01-AC 52-7F 00-00 00-00 00-00 00-00 00-00 09-00 00-00 00-00 00-00 8B-FB 6F-FD
6F-90 D5-41 20-AF 01-AC 52-7F 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00
00-00 00-00 00-00 5F-03 00-00>)
DEBUG: Expected: to be called twice
DEBUG: Actual: called 3 times - over-saturated and active
DEBUG: I1020 08:56:38.716214 28141 slave.cpp:3735] Cleaning up framework
0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000
DEBUG: I1020 08:56:38.716245 28131 gc.cpp:56] Scheduling
'/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_WO4xyx/slaves/0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0/frameworks/0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000/executors/0bbb69a3-3ba3-4016-86ed-4d80c1bf594b/runs/2b1371cc-4d78-44d1-b1f1-e1b29127adc8'
for gc 6.99999171211852days in the future
DEBUG: I1020 08:56:38.716320 28151 status_update_manager.cpp:284] Closing
status update streams for framework 0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000
DEBUG: I1020 08:56:38.716517 28131 gc.cpp:56] Scheduling
'/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_WO4xyx/slaves/0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0/frameworks/0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000/executors/0bbb69a3-3ba3-4016-86ed-4d80c1bf594b'
for gc 6.99999171165333days in the future
DEBUG: I1020 08:56:38.716562 28131 gc.cpp:56] Scheduling
'/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_WO4xyx/meta/slaves/0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0/frameworks/0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000/executors/0bbb69a3-3ba3-4016-86ed-4d80c1bf594b/runs/2b1371cc-4d78-44d1-b1f1-e1b29127adc8'
for gc 6.99999171135407days in the future
DEBUG: I1020 08:56:38.716585 28131 gc.cpp:56] Scheduling
'/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_WO4xyx/meta/slaves/0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0/frameworks/0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000/executors/0bbb69a3-3ba3-4016-86ed-4d80c1bf594b'
for gc 6.99999171074074days in the future
DEBUG: I1020 08:56:38.716604 28131 gc.cpp:56] Scheduling
'/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_WO4xyx/slaves/0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0/frameworks/0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000'
for gc 6.99999171001185days in the future
DEBUG: I1020 08:56:38.716624 28131 gc.cpp:56] Scheduling
'/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_WO4xyx/meta/slaves/0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0/frameworks/0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000'
for gc 6.9999917082637days in the future
DEBUG: I1020 08:56:38.716841 28115 slave.cpp:606] Slave terminating
DEBUG: I1020 08:56:38.716974 28150 master.cpp:1086] Slave
0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0 at slave(255)@172.16.132.117:37586
(smfc-ahy-19-sr2.corpdc.twitter.com) disconnected
DEBUG: I1020 08:56:38.716994 28150 master.cpp:2540] Disconnecting slave
0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0 at slave(255)@172.16.132.117:37586
(smfc-ahy-19-sr2.corpdc.twitter.com)
DEBUG: I1020 08:56:38.717082 28150 master.cpp:2559] Deactivating slave
0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0 at slave(255)@172.16.132.117:37586
(smfc-ahy-19-sr2.corpdc.twitter.com)
DEBUG: I1020 08:56:38.717128 28136 hierarchical.cpp:420] Slave
0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0 deactivated
DEBUG: I1020 08:56:38.717308 28136 hierarchical.cpp:739] Recovered cpus(*):2;
mem(*):1024; disk(*):1024; ports(*):[31000-32000] (total: cpus(*):2;
mem(*):1024; disk(*):1024; ports(*):[31000-32000], allocated: ) on slave
0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0 from framework
0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000
DEBUG: I1020 08:56:38.718613 28115 containerizer.cpp:143] Using isolation:
posix/cpu,posix/mem,filesystem/posix
DEBUG: W1020 08:56:38.718729 28115 backend.cpp:50] Failed to create 'bind'
backend: BindBackend requires root privileges
DEBUG: I1020 08:56:38.720690 28142 slave.cpp:191] Slave started on
256)@172.16.132.117:37586
DEBUG: I1020 08:56:38.720705 28142 slave.cpp:192] Flags at startup:
--appc_store_dir="/tmp/mesos/store/appc" --authenticatee="crammd5"
--cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false"
--cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false"
--cgroups_root="mesos" --container_disk_watch_interval="15secs"
--containerizers="mesos"
--credential="/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_WO4xyx/credential"
--default_role="*" --disk_watch_interval="1mins" --docker="docker"
--docker_kill_orphans="true"
--docker_local_archives_dir="/tmp/mesos/images/docker" --docker_puller="local"
--docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock"
--docker_stop_timeout="0ns" --docker_store_dir="/tmp/mesos/store/docker"
--egress_unique_flow_per_container="false"
--enforce_container_disk_quota="false" --ephemeral_ports_per_container="1024"
--executor_registration_timeout="1mins"
--executor_shutdown_grace_period="5secs"
--fetcher_cache_dir="/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_WO4xyx/fetch"
--fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks"
--gc_disk_headroom="0.1" --hadoop_home="" --help="false"
--hostname_lookup="true" --image_provisioner_backend="copy"
--initialize_driver_logging="true" --isolation="posix/cpu,posix/mem"
--launcher_dir="/builddir/build/BUILD/mesos-0.26.0/src" --logbufsecs="0"
--logging_level="INFO" --network_enable_socket_statistics_details="false"
--network_enable_socket_statistics_summary="false"
--oversubscribed_resources_interval="15secs" --perf_duration="10secs"
--perf_interval="1mins" --qos_correction_interval_min="0ns" --quiet="false"
--recover="reconnect" --recovery_timeout="15mins"
--registration_backoff_factor="10ms"
--resources="cpus:2;mem:1024;disk:1024;ports:[31000-32000]"
--revocable_cpu_low_priority="true" --sandbox_directory="/mnt/mesos/sandbox"
--strict="true" --switch_user="true"
--systemd_runtime_directory="/run/systemd/system" --version="false"
--work_dir="/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_WO4xyx"
DEBUG: I1020 08:56:38.720965 28142 credentials.hpp:85] Loading credential for
authentication from
'/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_WO4xyx/credential'
DEBUG: I1020 08:56:38.721045 28142 slave.cpp:322] Slave using credential for:
test-principal
DEBUG: I1020 08:56:38.721573 28142 slave.cpp:392] Slave resources: cpus(*):2;
mem(*):1024; disk(*):1024; ports(*):[31000-32000]
DEBUG: I1020 08:56:38.721603 28142 slave.cpp:400] Slave attributes: [ ]
DEBUG: I1020 08:56:38.721609 28142 slave.cpp:405] Slave hostname:
smfc-ahy-19-sr2.corpdc.twitter.com
DEBUG: I1020 08:56:38.721613 28142 slave.cpp:410] Slave checkpoint: true
DEBUG: I1020 08:56:38.721895 28131 state.cpp:54] Recovering state from
'/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_WO4xyx/meta'
DEBUG: I1020 08:56:38.721935 28131 state.cpp:681] No checkpointed resources
found at
'/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_WO4xyx/meta/resources/resources.info'
DEBUG: I1020 08:56:38.723804 28131 slave.cpp:4310] Recovering framework
0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000
DEBUG: I1020 08:56:38.723827 28131 slave.cpp:5103] Recovering executor
'0bbb69a3-3ba3-4016-86ed-4d80c1bf594b' of framework
0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000
DEBUG: I1020 08:56:38.724068 28142 gc.cpp:56] Scheduling
'/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_WO4xyx/slaves/0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0/frameworks/0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000/executors/0bbb69a3-3ba3-4016-86ed-4d80c1bf594b/runs/2b1371cc-4d78-44d1-b1f1-e1b29127adc8'
for gc 6.99999162064889days in the future
DEBUG: I1020 08:56:38.724123 28142 gc.cpp:56] Scheduling
'/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_WO4xyx/meta/slaves/0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0/frameworks/0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000/executors/0bbb69a3-3ba3-4016-86ed-4d80c1bf594b/runs/2b1371cc-4d78-44d1-b1f1-e1b29127adc8'
for gc 6.99999161984889days in the future
DEBUG: I1020 08:56:38.724180 28142 gc.cpp:56] Scheduling
'/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_WO4xyx/slaves/0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0/frameworks/0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000/executors/0bbb69a3-3ba3-4016-86ed-4d80c1bf594b'
for gc 6.99999161965333days in the future
DEBUG: I1020 08:56:38.724195 28131 slave.cpp:3735] Cleaning up framework
0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000
DEBUG: I1020 08:56:38.724220 28142 gc.cpp:56] Scheduling
'/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_WO4xyx/meta/slaves/0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0/frameworks/0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000/executors/0bbb69a3-3ba3-4016-86ed-4d80c1bf594b'
for gc 6.99999161943111days in the future
DEBUG: I1020 08:56:38.724318 28142 status_update_manager.cpp:284] Closing
status update streams for framework 0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000
DEBUG: I1020 08:56:38.724431 28142 gc.cpp:56] Scheduling
'/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_WO4xyx/slaves/0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0/frameworks/0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000'
for gc 6.99999161661037days in the future
DEBUG: I1020 08:56:38.724465 28142 gc.cpp:56] Scheduling
'/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_WO4xyx/meta/slaves/0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0/frameworks/0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000'
for gc 6.99999161520889days in the future
DEBUG: I1020 08:56:38.724778 28140 status_update_manager.cpp:202] Recovering
status update manager
DEBUG: I1020 08:56:38.724791 28140 status_update_manager.cpp:210] Recovering
executor '0bbb69a3-3ba3-4016-86ed-4d80c1bf594b' of framework
0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000
DEBUG: I1020 08:56:38.725278 28132 containerizer.cpp:385] Recovering
containerizer
DEBUG: I1020 08:56:38.725953 28142 slave.cpp:4222] Finished recovery
DEBUG: I1020 08:56:38.726372 28131 status_update_manager.cpp:176] Pausing
sending status updates
DEBUG: I1020 08:56:38.726676 28133 slave.cpp:726] New master detected at
[email protected]:37586
DEBUG: I1020 08:56:38.726698 28133 slave.cpp:789] Authenticating with master
[email protected]:37586
DEBUG: I1020 08:56:38.726706 28133 slave.cpp:794] Using default CRAM-MD5
authenticatee
DEBUG: I1020 08:56:38.726750 28133 slave.cpp:762] Detecting new master
DEBUG: I1020 08:56:38.726832 28153 authenticatee.cpp:115] Creating new client
SASL connection
DEBUG: I1020 08:56:38.727023 28153 master.cpp:5144] Authenticating
slave(256)@172.16.132.117:37586
DEBUG: I1020 08:56:38.727404 28149 authenticator.cpp:92] Creating new server
SASL connection
DEBUG: I1020 08:56:38.727483 28149 authenticatee.cpp:206] Received SASL
authentication mechanisms: CRAM-MD5
DEBUG: I1020 08:56:38.727496 28149 authenticatee.cpp:232] Attempting to
authenticate with mechanism 'CRAM-MD5'
DEBUG: I1020 08:56:38.727525 28149 authenticator.cpp:197] Received SASL
authentication start
DEBUG: I1020 08:56:38.727591 28149 authenticator.cpp:319] Authentication
requires more steps
DEBUG: I1020 08:56:38.727670 28135 authenticatee.cpp:252] Received SASL
authentication step
DEBUG: I1020 08:56:38.727736 28135 authenticator.cpp:225] Received SASL
authentication step
DEBUG: I1020 08:56:38.727766 28135 authenticator.cpp:311] Authentication success
DEBUG: I1020 08:56:38.727821 28147 authenticatee.cpp:292] Authentication success
DEBUG: I1020 08:56:38.727826 28132 master.cpp:5174] Successfully authenticated
principal 'test-principal' at slave(256)@172.16.132.117:37586
DEBUG: I1020 08:56:38.728272 28144 slave.cpp:857] Successfully authenticated
with master [email protected]:37586
DEBUG: I1020 08:56:38.728466 28153 master.cpp:4028] Re-registering slave
0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0 at slave(255)@172.16.132.117:37586
(smfc-ahy-19-sr2.corpdc.twitter.com)
DEBUG: W1020 08:56:38.728508 28153 master.cpp:5241] Task
0bbb69a3-3ba3-4016-86ed-4d80c1bf594b of framework
0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 unknown to the slave
0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0 at slave(256)@172.16.132.117:37586
(smfc-ahy-19-sr2.corpdc.twitter.com) during re-registration: reconciling with
the slave
DEBUG: I1020 08:56:38.728565 28153 master.cpp:4216] Sending updated
checkpointed resources to slave 0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0 at
slave(256)@172.16.132.117:37586 (smfc-ahy-19-sr2.corpdc.twitter.com)
DEBUG: I1020 08:56:38.728612 28145 slave.cpp:1001] Re-registered with master
[email protected]:37586
DEBUG: I1020 08:56:38.728628 28140 hierarchical.cpp:408] Slave
0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0 reactivated
DEBUG: I1020 08:56:38.728693 28145 slave.cpp:1037] Forwarding total
oversubscribed resources
DEBUG: I1020 08:56:38.728703 28132 status_update_manager.cpp:183] Resuming
sending status updates
DEBUG: W1020 08:56:38.728718 28145 slave.cpp:1077] Slave reconciling task
0bbb69a3-3ba3-4016-86ed-4d80c1bf594b of framework
0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 in state TASK_LOST: task unknown to
the slave
DEBUG: I1020 08:56:38.728761 28135 master.cpp:4278] Received update of slave
0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0 at slave(256)@172.16.132.117:37586
(smfc-ahy-19-sr2.corpdc.twitter.com) with total oversubscribed resources
DEBUG: I1020 08:56:38.728778 28141 status_update_manager.cpp:322] Received
status update TASK_LOST (UUID: 30c316cf-fda2-4ab4-9c43-910e9416bab2) for task
0bbb69a3-3ba3-4016-86ed-4d80c1bf594b of framework
0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000
DEBUG: I1020 08:56:38.728927 28135 hierarchical.cpp:391] Slave
0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0 (smfc-ahy-19-sr2.corpdc.twitter.com)
updated with oversubscribed resources (total: cpus(*):2; mem(*):1024;
disk(*):1024; ports(*):[31000-32000], allocated: )
DEBUG: I1020 08:56:38.728973 28145 slave.cpp:2284] Updated checkpointed
resources from to
DEBUG: I1020 08:56:38.729003 28145 slave.cpp:3090] Forwarding the update
TASK_LOST (UUID: 30c316cf-fda2-4ab4-9c43-910e9416bab2) for task
0bbb69a3-3ba3-4016-86ed-4d80c1bf594b of framework
0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 to [email protected]:37586
DEBUG: I1020 08:56:38.729135 28148 master.cpp:4973] Sending 1 offers to
framework 0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 (default) at
[email protected]:37586
DEBUG: I1020 08:56:38.729395 28148 master.cpp:4421] Status update TASK_LOST
(UUID: 30c316cf-fda2-4ab4-9c43-910e9416bab2) for task
0bbb69a3-3ba3-4016-86ed-4d80c1bf594b of framework
0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 from slave
0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0 at slave(256)@172.16.132.117:37586
(smfc-ahy-19-sr2.corpdc.twitter.com)
DEBUG: I1020 08:56:38.729423 28148 master.cpp:4460] Forwarding status update
TASK_LOST (UUID: 30c316cf-fda2-4ab4-9c43-910e9416bab2) for task
0bbb69a3-3ba3-4016-86ed-4d80c1bf594b of framework
0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000
DEBUG: E1020 08:56:38.729467 28148 master.cpp:6032] Ignoring status update for
the terminated task 0bbb69a3-3ba3-4016-86ed-4d80c1bf594b (TASK_FINISHED ->
TASK_LOST) of framework 0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000
DEBUG: I1020 08:56:38.729768 28115 sched.cpp:1781] Asked to stop the driver
DEBUG: I1020 08:56:38.729800 28115 master.cpp:925] Master terminating
DEBUG: I1020 08:56:38.729825 28115 master.cpp:6154] Removing task
0bbb69a3-3ba3-4016-86ed-4d80c1bf594b with resources cpus(*):2; mem(*):1024;
disk(*):1024; ports(*):[31000-32000] of framework
0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 on slave
0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0 at slave(256)@172.16.132.117:37586
(smfc-ahy-19-sr2.corpdc.twitter.com)
DEBUG: I1020 08:56:38.729920 28137 hierarchical.cpp:364] Removed slave
0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0
DEBUG: I1020 08:56:38.729949 28135 sched.cpp:1042] Stopping framework
'0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000'
DEBUG: I1020 08:56:38.730121 28137 hierarchical.cpp:220] Removed framework
0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000
DEBUG: I1020 08:56:38.730507 28135 slave.cpp:3258] [email protected]:37586
exited
DEBUG: W1020 08:56:38.730526 28135 slave.cpp:3261] Master disconnected! Waiting
for a new master to be elected
DEBUG: [ FAILED ] SlaveRecoveryTest/0.RecoverCompletedExecutor, where
TypeParam = mesos::internal::slave::MesosContainerizer (2288 ms)
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)