[
https://issues.apache.org/jira/browse/MESOS-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15034263#comment-15034263
]
Jan Schlicht commented on MESOS-3586:
-------------------------------------
I have to reopen this, as I've found the same behavior using the 0.26-rc2 on
CentOS 7.1. Noticed some flakiness while running {{sudo ./bin/mesos-tests.sh}}
and could reproduce it by running {{sudo ./bin/mesos-tests.sh -
--gtest_filter="MemoryPressureMesosTest.CGROUPS_ROOT_Statistics"
--gtest_repeat=-1 --gtest_break_on_failure}} until it breaks.
Here's a verbose output of a failing test:
{noformat}
[ RUN ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics
I1201 18:07:51.136508 18883 cgroups.cpp:2429] Freezing cgroup
/sys/fs/cgroup/freezer/mesos_test_7bcd6aa5-6f35-44ea-90a5-e7f047edbffb/d540e60d-2d62-4a1e-b5ff-482f7b3cc1a5
I1201 18:07:51.144594 18886 cgroups.cpp:1411] Successfully froze cgroup
/sys/fs/cgroup/freezer/mesos_test_7bcd6aa5-6f35-44ea-90a5-e7f047edbffb/d540e60d-2d62-4a1e-b5ff-482f7b3cc1a5
after 7.076864ms
I1201 18:07:51.151480 18882 cgroups.cpp:2447] Thawing cgroup
/sys/fs/cgroup/freezer/mesos_test_7bcd6aa5-6f35-44ea-90a5-e7f047edbffb/d540e60d-2d62-4a1e-b5ff-482f7b3cc1a5
I1201 18:07:51.162557 18886 cgroups.cpp:1440] Successfullly thawed cgroup
/sys/fs/cgroup/freezer/mesos_test_7bcd6aa5-6f35-44ea-90a5-e7f047edbffb/d540e60d-2d62-4a1e-b5ff-482f7b3cc1a5
after 11.026944ms
I1201 18:07:51.172379 18887 cgroups.cpp:2429] Freezing cgroup
/sys/fs/cgroup/freezer/mesos_test_7bcd6aa5-6f35-44ea-90a5-e7f047edbffb
I1201 18:07:51.183791 18881 cgroups.cpp:1411] Successfully froze cgroup
/sys/fs/cgroup/freezer/mesos_test_7bcd6aa5-6f35-44ea-90a5-e7f047edbffb after
7.8272ms
I1201 18:07:51.192354 18887 cgroups.cpp:2447] Thawing cgroup
/sys/fs/cgroup/freezer/mesos_test_7bcd6aa5-6f35-44ea-90a5-e7f047edbffb
I1201 18:07:51.199439 18885 cgroups.cpp:1440] Successfullly thawed cgroup
/sys/fs/cgroup/freezer/mesos_test_7bcd6aa5-6f35-44ea-90a5-e7f047edbffb after
7.028224ms
I1201 18:07:51.332849 18866 leveldb.cpp:176] Opened db in 6.74674ms
I1201 18:07:51.335450 18866 leveldb.cpp:183] Compacted db in 2.554513ms
I1201 18:07:51.335539 18866 leveldb.cpp:198] Created db iterator in 53851ns
I1201 18:07:51.335556 18866 leveldb.cpp:204] Seeked to beginning of db in 3455ns
I1201 18:07:51.335561 18866 leveldb.cpp:273] Iterated through 0 keys in the db
in 107ns
I1201 18:07:51.335666 18866 replica.cpp:780] Replica recovered with log
positions 0 -> 0 with 1 holes and 0 unlearned
I1201 18:07:51.337374 18881 recover.cpp:449] Starting replica recovery
I1201 18:07:51.338235 18881 recover.cpp:475] Replica is in EMPTY status
I1201 18:07:51.340142 18880 replica.cpp:676] Replica in EMPTY status received a
broadcasted recover request from (14)@127.0.0.1:57652
I1201 18:07:51.340749 18882 recover.cpp:195] Received a recover response from a
replica in EMPTY status
I1201 18:07:51.340975 18885 master.cpp:367] Master
2f17d97c-de40-491e-9706-bf83a9ffd08c (centos71) started on 127.0.0.1:57652
I1201 18:07:51.341475 18884 recover.cpp:566] Updating replica status to STARTING
I1201 18:07:51.341152 18885 master.cpp:369] Flags at startup: --acls=""
--allocation_interval="1secs" --allocator="HierarchicalDRF"
--authenticate="true" --authenticate_slaves="true" --authenticators="crammd5"
--authorizers="local" --credentials="/tmp/ap4rPt/credentials"
--framework_sorter="drf" --help="false" --hostname_lookup="true"
--initialize_driver_logging="true" --log_auto_initialize="true"
--logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5"
--quiet="false" --recovery_slave_removal_limit="100%"
--registry="replicated_log" --registry_fetch_timeout="1mins"
--registry_store_timeout="25secs" --registry_strict="true"
--root_submissions="true" --slave_ping_timeout="15secs"
--slave_reregister_timeout="10mins" --user_sorter="drf" --version="false"
--webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/ap4rPt/master"
--zk_session_timeout="10secs"
W1201 18:07:51.341752 18885 master.cpp:372]
**************************************************
Master bound to loopback interface! Cannot communicate with remote schedulers
or slaves. You might want to set '--ip' flag to a routable IP address.
**************************************************
I1201 18:07:51.341794 18885 master.cpp:414] Master only allowing authenticated
frameworks to register
I1201 18:07:51.341804 18885 master.cpp:419] Master only allowing authenticated
slaves to register
I1201 18:07:51.341879 18885 credentials.hpp:37] Loading credentials for
authentication from '/tmp/ap4rPt/credentials'
I1201 18:07:51.345211 18885 master.cpp:458] Using default 'crammd5'
authenticator
I1201 18:07:51.345268 18882 leveldb.cpp:306] Persisting metadata (8 bytes) to
leveldb took 3.5302ms
I1201 18:07:51.345289 18882 replica.cpp:323] Persisted replica status to
STARTING
I1201 18:07:51.345350 18885 authenticator.cpp:520] Initializing server SASL
I1201 18:07:51.345512 18882 recover.cpp:475] Replica is in STARTING status
I1201 18:07:51.346271 18885 master.cpp:495] Authorization enabled
I1201 18:07:51.346827 18886 replica.cpp:676] Replica in STARTING status
received a broadcasted recover request from (15)@127.0.0.1:57652
I1201 18:07:51.347117 18882 recover.cpp:195] Received a recover response from a
replica in STARTING status
I1201 18:07:51.347568 18886 recover.cpp:566] Updating replica status to VOTING
I1201 18:07:51.349238 18886 leveldb.cpp:306] Persisting metadata (8 bytes) to
leveldb took 1.409364ms
I1201 18:07:51.349272 18886 replica.cpp:323] Persisted replica status to VOTING
I1201 18:07:51.349385 18886 recover.cpp:580] Successfully joined the Paxos group
I1201 18:07:51.349553 18886 recover.cpp:464] Recover process terminated
I1201 18:07:51.351759 18880 master.cpp:1606] The newly elected leader is
[email protected]:57652 with id 2f17d97c-de40-491e-9706-bf83a9ffd08c
I1201 18:07:51.351795 18880 master.cpp:1619] Elected as the leading master!
I1201 18:07:51.351820 18880 master.cpp:1379] Recovering from registrar
I1201 18:07:51.352057 18885 registrar.cpp:309] Recovering registrar
I1201 18:07:51.353216 18887 log.cpp:661] Attempting to start the writer
I1201 18:07:51.355137 18885 replica.cpp:496] Replica received implicit promise
request from (16)@127.0.0.1:57652 with proposal 1
I1201 18:07:51.357552 18885 leveldb.cpp:306] Persisting metadata (8 bytes) to
leveldb took 2.373033ms
I1201 18:07:51.357578 18885 replica.cpp:345] Persisted promised to 1
I1201 18:07:51.358737 18881 coordinator.cpp:240] Coordinator attempting to fill
missing positions
I1201 18:07:51.360587 18887 replica.cpp:391] Replica received explicit promise
request from (17)@127.0.0.1:57652 for position 0 with proposal 2
I1201 18:07:51.361548 18887 leveldb.cpp:343] Persisting action (8 bytes) to
leveldb took 918190ns
I1201 18:07:51.361569 18887 replica.cpp:715] Persisted action at 0
I1201 18:07:51.363291 18882 replica.cpp:540] Replica received write request for
position 0 from (18)@127.0.0.1:57652
I1201 18:07:51.363356 18882 leveldb.cpp:438] Reading position from leveldb took
40274ns
I1201 18:07:51.364213 18882 leveldb.cpp:343] Persisting action (14 bytes) to
leveldb took 804446ns
I1201 18:07:51.364233 18882 replica.cpp:715] Persisted action at 0
I1201 18:07:51.365104 18884 replica.cpp:694] Replica received learned notice
for position 0 from @0.0.0.0:0
I1201 18:07:51.366173 18884 leveldb.cpp:343] Persisting action (16 bytes) to
leveldb took 1.043935ms
I1201 18:07:51.366197 18884 replica.cpp:715] Persisted action at 0
I1201 18:07:51.366211 18884 replica.cpp:700] Replica learned NOP action at
position 0
I1201 18:07:51.367842 18887 log.cpp:677] Writer started with ending position 0
I1201 18:07:51.369870 18884 leveldb.cpp:438] Reading position from leveldb took
49993ns
I1201 18:07:51.372493 18882 registrar.cpp:342] Successfully fetched the
registry (0B) in 20.388096ms
I1201 18:07:51.372692 18882 registrar.cpp:441] Applied 1 operations in 69005ns;
attempting to update the 'registry'
I1201 18:07:51.376373 18880 log.cpp:685] Attempting to append 158 bytes to the
log
I1201 18:07:51.377168 18883 coordinator.cpp:350] Coordinator attempting to
write APPEND action at position 1
I1201 18:07:51.379091 18883 replica.cpp:540] Replica received write request for
position 1 from (19)@127.0.0.1:57652
I1201 18:07:51.380544 18883 leveldb.cpp:343] Persisting action (177 bytes) to
leveldb took 1.418125ms
I1201 18:07:51.380570 18883 replica.cpp:715] Persisted action at 1
I1201 18:07:51.382406 18885 replica.cpp:694] Replica received learned notice
for position 1 from @0.0.0.0:0
I1201 18:07:51.382995 18885 leveldb.cpp:343] Persisting action (179 bytes) to
leveldb took 563416ns
I1201 18:07:51.383013 18885 replica.cpp:715] Persisted action at 1
I1201 18:07:51.383025 18885 replica.cpp:700] Replica learned APPEND action at
position 1
I1201 18:07:51.387128 18885 registrar.cpp:486] Successfully updated the
'registry' in 14.34496ms
I1201 18:07:51.387258 18885 registrar.cpp:372] Successfully recovered registrar
I1201 18:07:51.387377 18882 log.cpp:704] Attempting to truncate the log to 1
I1201 18:07:51.387805 18886 coordinator.cpp:350] Coordinator attempting to
write TRUNCATE action at position 2
I1201 18:07:51.387814 18884 master.cpp:1416] Recovered 0 slaves from the
Registry (120B) ; allowing 10mins for slaves to re-register
I1201 18:07:51.389592 18885 replica.cpp:540] Replica received write request for
position 2 from (20)@127.0.0.1:57652
I1201 18:07:51.390261 18885 leveldb.cpp:343] Persisting action (16 bytes) to
leveldb took 642820ns
I1201 18:07:51.390278 18885 replica.cpp:715] Persisted action at 2
I1201 18:07:51.392351 18882 replica.cpp:694] Replica received learned notice
for position 2 from @0.0.0.0:0
I1201 18:07:51.393007 18882 leveldb.cpp:343] Persisting action (18 bytes) to
leveldb took 614947ns
I1201 18:07:51.393052 18882 leveldb.cpp:401] Deleting ~1 keys from leveldb took
27109ns
I1201 18:07:51.393064 18882 replica.cpp:715] Persisted action at 2
I1201 18:07:51.393076 18882 replica.cpp:700] Replica learned TRUNCATE action at
position 2
I1201 18:07:51.402930 18866 containerizer.cpp:142] Using isolation:
cgroups/mem,filesystem/posix
I1201 18:07:51.498821 18866 linux_launcher.cpp:103] Using
/sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher
I1201 18:07:51.503211 18866 systemd.cpp:128] systemd version `208` detected
W1201 18:07:51.503242 18866 systemd.cpp:136] Required functionality `Delegate`
was introduced in Version `218`. Your system may not function properly; however
since some distributions have patched systemd packages, your system may still
be functional. This is why we keep running. See MESOS-3352 for more information
I1201 18:07:51.508138 18866 systemd.cpp:210] Started systemd slice
`mesos_executors.slice`
I1201 18:07:51.514886 18881 slave.cpp:191] Slave started on 1)@127.0.0.1:57652
I1201 18:07:51.514910 18881 slave.cpp:192] Flags at startup:
--appc_store_dir="/tmp/mesos/store/appc" --authenticatee="crammd5"
--cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false"
--cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false"
--cgroups_root="mesos_test_c677185f-73c5-4af9-9029-007647d301f9"
--container_disk_watch_interval="15secs" --containerizers="mesos"
--credential="/tmp/MemoryPressureMesosTest_CGROUPS_ROOT_Statistics_rUspgL/credential"
--default_role="*" --disk_watch_interval="1mins" --docker="docker"
--docker_auth_server="auth.docker.io" --docker_auth_server_port="443"
--docker_kill_orphans="true"
--docker_local_archives_dir="/tmp/mesos/images/docker" --docker_puller="local"
--docker_puller_timeout="60" --docker_registry="registry-1.docker.io"
--docker_registry_port="443" --docker_remove_delay="6hrs"
--docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns"
--docker_store_dir="/tmp/mesos/store/docker"
--enforce_container_disk_quota="false" --executor_registration_timeout="1mins"
--executor_shutdown_grace_period="5secs"
--fetcher_cache_dir="/tmp/MemoryPressureMesosTest_CGROUPS_ROOT_Statistics_rUspgL/fetch"
--fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks"
--gc_disk_headroom="0.1" --hadoop_home="" --help="false"
--hostname_lookup="true" --image_provisioner_backend="copy"
--initialize_driver_logging="true" --isolation="cgroups/mem"
--launcher_dir="/home/vagrant/mesos/build-ssl/src" --logbufsecs="0"
--logging_level="INFO" --oversubscribed_resources_interval="15secs"
--perf_duration="10secs" --perf_interval="1mins"
--qos_correction_interval_min="0ns" --quiet="false" --recover="reconnect"
--recovery_timeout="15mins" --registration_backoff_factor="10ms"
--resources="cpus:2;mem:1024;disk:1024;ports:[31000-32000]"
--revocable_cpu_low_priority="true" --sandbox_directory="/mnt/mesos/sandbox"
--strict="true" --switch_user="true"
--systemd_runtime_directory="/run/systemd/system" --version="false"
--work_dir="/tmp/MemoryPressureMesosTest_CGROUPS_ROOT_Statistics_rUspgL"
W1201 18:07:51.515254 18881 slave.cpp:195]
**************************************************
Slave bound to loopback interface! Cannot communicate with remote master(s).
You might want to set '--ip' flag to a routable IP address.
**************************************************
I1201 18:07:51.515266 18881 credentials.hpp:85] Loading credential for
authentication from
'/tmp/MemoryPressureMesosTest_CGROUPS_ROOT_Statistics_rUspgL/credential'
I1201 18:07:51.515554 18881 slave.cpp:322] Slave using credential for:
test-principal
W1201 18:07:51.516015 18866 sched.cpp:1542]
**************************************************
Scheduler driver bound to loopback interface! Cannot communicate with remote
master(s). You might want to set 'LIBPROCESS_IP' environment variable to use a
routable IP address.
**************************************************
I1201 18:07:51.518287 18881 slave.cpp:392] Slave resources: cpus(*):2;
mem(*):1024; disk(*):1024; ports(*):[31000-32000]
I1201 18:07:51.518360 18881 slave.cpp:400] Slave attributes: [ ]
I1201 18:07:51.518374 18881 slave.cpp:405] Slave hostname: centos71
I1201 18:07:51.518379 18881 slave.cpp:410] Slave checkpoint: true
I1201 18:07:51.518565 18866 sched.cpp:166] Version: 0.26.0
I1201 18:07:51.519347 18883 sched.cpp:264] New master detected at
[email protected]:57652
I1201 18:07:51.519405 18883 sched.cpp:320] Authenticating with master
[email protected]:57652
I1201 18:07:51.519417 18883 sched.cpp:327] Using default CRAM-MD5 authenticatee
I1201 18:07:51.519732 18886 authenticatee.cpp:99] Initializing client SASL
I1201 18:07:51.519870 18886 authenticatee.cpp:123] Creating new client SASL
connection
I1201 18:07:51.520319 18883 state.cpp:54] Recovering state from
'/tmp/MemoryPressureMesosTest_CGROUPS_ROOT_Statistics_rUspgL/meta'
I1201 18:07:51.520938 18884 master.cpp:5150] Authenticating
[email protected]:57652
I1201 18:07:51.521203 18887 status_update_manager.cpp:202] Recovering status
update manager
I1201 18:07:51.521390 18884 containerizer.cpp:384] Recovering containerizer
I1201 18:07:51.521517 18886 authenticator.cpp:100] Creating new server SASL
connection
I1201 18:07:51.522209 18886 authenticatee.cpp:214] Received SASL authentication
mechanisms: CRAM-MD5
I1201 18:07:51.522238 18886 authenticatee.cpp:240] Attempting to authenticate
with mechanism 'CRAM-MD5'
I1201 18:07:51.522457 18883 authenticator.cpp:205] Received SASL authentication
start
I1201 18:07:51.522548 18883 authenticator.cpp:327] Authentication requires more
steps
I1201 18:07:51.522692 18886 authenticatee.cpp:260] Received SASL authentication
step
I1201 18:07:51.522801 18886 authenticator.cpp:233] Received SASL authentication
step
I1201 18:07:51.522953 18886 authenticator.cpp:319] Authentication success
I1201 18:07:51.523102 18885 authenticatee.cpp:300] Authentication success
I1201 18:07:51.523355 18887 master.cpp:5180] Successfully authenticated
principal 'test-principal' at
[email protected]:57652
I1201 18:07:51.524238 18880 sched.cpp:409] Successfully authenticated with
master [email protected]:57652
I1201 18:07:51.524529 18881 master.cpp:2176] Received SUBSCRIBE call for
framework 'default' at
[email protected]:57652
I1201 18:07:51.526083 18881 master.cpp:1645] Authorizing framework principal
'test-principal' to receive offers for role '*'
I1201 18:07:51.526430 18882 master.cpp:2247] Subscribing framework default with
checkpointing disabled and capabilities [ ]
I1201 18:07:51.527278 18881 hierarchical.cpp:195] Added framework
2f17d97c-de40-491e-9706-bf83a9ffd08c-0000
I1201 18:07:51.527310 18882 sched.cpp:643] Framework registered with
2f17d97c-de40-491e-9706-bf83a9ffd08c-0000
I1201 18:07:51.532703 18887 slave.cpp:4230] Finished recovery
I1201 18:07:51.533692 18887 slave.cpp:729] New master detected at
[email protected]:57652
I1201 18:07:51.533741 18880 status_update_manager.cpp:176] Pausing sending
status updates
I1201 18:07:51.533759 18887 slave.cpp:792] Authenticating with master
[email protected]:57652
I1201 18:07:51.533769 18887 slave.cpp:797] Using default CRAM-MD5 authenticatee
I1201 18:07:51.533937 18887 slave.cpp:765] Detecting new master
I1201 18:07:51.534019 18884 authenticatee.cpp:123] Creating new client SASL
connection
I1201 18:07:51.535223 18885 master.cpp:5150] Authenticating
slave(1)@127.0.0.1:57652
I1201 18:07:51.535610 18885 authenticator.cpp:100] Creating new server SASL
connection
I1201 18:07:51.536231 18887 authenticatee.cpp:214] Received SASL authentication
mechanisms: CRAM-MD5
I1201 18:07:51.536253 18887 authenticatee.cpp:240] Attempting to authenticate
with mechanism 'CRAM-MD5'
I1201 18:07:51.536309 18887 authenticator.cpp:205] Received SASL authentication
start
I1201 18:07:51.536342 18887 authenticator.cpp:327] Authentication requires more
steps
I1201 18:07:51.536396 18887 authenticatee.cpp:260] Received SASL authentication
step
I1201 18:07:51.536480 18887 authenticator.cpp:233] Received SASL authentication
step
I1201 18:07:51.536531 18887 authenticator.cpp:319] Authentication success
I1201 18:07:51.536628 18885 authenticatee.cpp:300] Authentication success
I1201 18:07:51.536666 18884 master.cpp:5180] Successfully authenticated
principal 'test-principal' at slave(1)@127.0.0.1:57652
I1201 18:07:51.538077 18885 slave.cpp:860] Successfully authenticated with
master [email protected]:57652
I1201 18:07:51.538951 18882 master.cpp:3859] Registering slave at
slave(1)@127.0.0.1:57652 (centos71) with id
2f17d97c-de40-491e-9706-bf83a9ffd08c-S0
I1201 18:07:51.539645 18881 registrar.cpp:441] Applied 1 operations in 53988ns;
attempting to update the 'registry'
I1201 18:07:51.540289 18881 log.cpp:685] Attempting to append 324 bytes to the
log
I1201 18:07:51.540432 18887 coordinator.cpp:350] Coordinator attempting to
write APPEND action at position 3
I1201 18:07:51.542268 18886 replica.cpp:540] Replica received write request for
position 3 from (38)@127.0.0.1:57652
I1201 18:07:51.545014 18886 leveldb.cpp:343] Persisting action (343 bytes) to
leveldb took 2.706483ms
I1201 18:07:51.545053 18886 replica.cpp:715] Persisted action at 3
I1201 18:07:51.546170 18881 replica.cpp:694] Replica received learned notice
for position 3 from @0.0.0.0:0
I1201 18:07:51.547289 18881 leveldb.cpp:343] Persisting action (345 bytes) to
leveldb took 1.061009ms
I1201 18:07:51.547319 18881 replica.cpp:715] Persisted action at 3
I1201 18:07:51.547333 18881 replica.cpp:700] Replica learned APPEND action at
position 3
I1201 18:07:51.548413 18881 registrar.cpp:486] Successfully updated the
'registry' in 8.70016ms
I1201 18:07:51.548601 18881 log.cpp:704] Attempting to truncate the log to 3
I1201 18:07:51.549383 18883 coordinator.cpp:350] Coordinator attempting to
write TRUNCATE action at position 4
I1201 18:07:51.549446 18881 master.cpp:3927] Registered slave
2f17d97c-de40-491e-9706-bf83a9ffd08c-S0 at slave(1)@127.0.0.1:57652 (centos71)
with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000]
I1201 18:07:51.549605 18886 hierarchical.cpp:344] Added slave
2f17d97c-de40-491e-9706-bf83a9ffd08c-S0 (centos71) with cpus(*):2; mem(*):1024;
disk(*):1024; ports(*):[31000-32000] (allocated: )
I1201 18:07:51.549808 18882 slave.cpp:904] Registered with master
[email protected]:57652; given slave ID 2f17d97c-de40-491e-9706-bf83a9ffd08c-S0
I1201 18:07:51.550792 18880 status_update_manager.cpp:183] Resuming sending
status updates
I1201 18:07:51.551144 18882 slave.cpp:963] Forwarding total oversubscribed
resources
I1201 18:07:51.551143 18885 replica.cpp:540] Replica received write request for
position 4 from (39)@127.0.0.1:57652
I1201 18:07:51.551512 18881 master.cpp:4979] Sending 1 offers to framework
2f17d97c-de40-491e-9706-bf83a9ffd08c-0000 (default) at
[email protected]:57652
I1201 18:07:51.551702 18881 master.cpp:4269] Received update of slave
2f17d97c-de40-491e-9706-bf83a9ffd08c-S0 at slave(1)@127.0.0.1:57652 (centos71)
with total oversubscribed resources
I1201 18:07:51.551928 18881 hierarchical.cpp:400] Slave
2f17d97c-de40-491e-9706-bf83a9ffd08c-S0 (centos71) updated with oversubscribed
resources (total: cpus(*):2; mem(*):1024; disk(*):1024;
ports(*):[31000-32000], allocated: cpus(*):2; mem(*):1024; disk(*):1024;
ports(*):[31000-32000])
I1201 18:07:51.552104 18885 leveldb.cpp:343] Persisting action (16 bytes) to
leveldb took 903203ns
I1201 18:07:51.552137 18885 replica.cpp:715] Persisted action at 4
I1201 18:07:51.552709 18886 replica.cpp:694] Replica received learned notice
for position 4 from @0.0.0.0:0
I1201 18:07:51.556046 18886 leveldb.cpp:343] Persisting action (18 bytes) to
leveldb took 3.315992ms
I1201 18:07:51.556107 18886 leveldb.cpp:401] Deleting ~2 keys from leveldb took
36657ns
I1201 18:07:51.556152 18886 replica.cpp:715] Persisted action at 4
I1201 18:07:51.556174 18886 replica.cpp:700] Replica learned TRUNCATE action at
position 4
I1201 18:07:51.556725 18880 master.cpp:2915] Processing ACCEPT call for offers:
[ 2f17d97c-de40-491e-9706-bf83a9ffd08c-O0 ] on slave
2f17d97c-de40-491e-9706-bf83a9ffd08c-S0 at slave(1)@127.0.0.1:57652 (centos71)
for framework 2f17d97c-de40-491e-9706-bf83a9ffd08c-0000 (default) at
[email protected]:57652
I1201 18:07:51.556794 18880 master.cpp:2711] Authorizing framework principal
'test-principal' to launch task 8e530058-d9c1-4c6c-8837-09269dbc616a as user
'root'
I1201 18:07:51.558148 18882 master.hpp:176] Adding task
8e530058-d9c1-4c6c-8837-09269dbc616a with resources cpus(*):1; mem(*):256;
disk(*):1024 on slave 2f17d97c-de40-491e-9706-bf83a9ffd08c-S0 (centos71)
I1201 18:07:51.558205 18882 master.cpp:3245] Launching task
8e530058-d9c1-4c6c-8837-09269dbc616a of framework
2f17d97c-de40-491e-9706-bf83a9ffd08c-0000 (default) at
[email protected]:57652 with resources
cpus(*):1; mem(*):256; disk(*):1024 on slave
2f17d97c-de40-491e-9706-bf83a9ffd08c-S0 at slave(1)@127.0.0.1:57652 (centos71)
I1201 18:07:51.558461 18886 slave.cpp:1294] Got assigned task
8e530058-d9c1-4c6c-8837-09269dbc616a for framework
2f17d97c-de40-491e-9706-bf83a9ffd08c-0000
I1201 18:07:51.558547 18885 hierarchical.cpp:744] Recovered cpus(*):1;
mem(*):768; ports(*):[31000-32000] (total: cpus(*):2; mem(*):1024;
disk(*):1024; ports(*):[31000-32000], allocated: cpus(*):1; mem(*):256;
disk(*):1024) on slave 2f17d97c-de40-491e-9706-bf83a9ffd08c-S0 from framework
2f17d97c-de40-491e-9706-bf83a9ffd08c-0000
I1201 18:07:51.559119 18886 slave.cpp:1410] Launching task
8e530058-d9c1-4c6c-8837-09269dbc616a for framework
2f17d97c-de40-491e-9706-bf83a9ffd08c-0000
I1201 18:07:51.560716 18886 paths.cpp:436] Trying to chown
'/tmp/MemoryPressureMesosTest_CGROUPS_ROOT_Statistics_rUspgL/slaves/2f17d97c-de40-491e-9706-bf83a9ffd08c-S0/frameworks/2f17d97c-de40-491e-9706-bf83a9ffd08c-0000/executors/8e530058-d9c1-4c6c-8837-09269dbc616a/runs/3a2b1c72-96aa-469f-8c3e-c63b55b0375c'
to user 'root'
I1201 18:07:51.563762 18886 slave.cpp:4999] Launching executor
8e530058-d9c1-4c6c-8837-09269dbc616a of framework
2f17d97c-de40-491e-9706-bf83a9ffd08c-0000 with resources cpus(*):0.1; mem(*):32
in work directory
'/tmp/MemoryPressureMesosTest_CGROUPS_ROOT_Statistics_rUspgL/slaves/2f17d97c-de40-491e-9706-bf83a9ffd08c-S0/frameworks/2f17d97c-de40-491e-9706-bf83a9ffd08c-0000/executors/8e530058-d9c1-4c6c-8837-09269dbc616a/runs/3a2b1c72-96aa-469f-8c3e-c63b55b0375c'
I1201 18:07:51.564254 18881 containerizer.cpp:618] Starting container
'3a2b1c72-96aa-469f-8c3e-c63b55b0375c' for executor
'8e530058-d9c1-4c6c-8837-09269dbc616a' of framework
'2f17d97c-de40-491e-9706-bf83a9ffd08c-0000'
I1201 18:07:51.564388 18886 slave.cpp:1628] Queuing task
'8e530058-d9c1-4c6c-8837-09269dbc616a' for executor
'8e530058-d9c1-4c6c-8837-09269dbc616a' of framework
2f17d97c-de40-491e-9706-bf83a9ffd08c-0000
I1201 18:07:51.574303 18887 mem.cpp:605] Started listening for OOM events for
container 3a2b1c72-96aa-469f-8c3e-c63b55b0375c
I1201 18:07:51.577256 18887 mem.cpp:725] Started listening on low memory
pressure events for container 3a2b1c72-96aa-469f-8c3e-c63b55b0375c
I1201 18:07:51.579624 18887 mem.cpp:725] Started listening on medium memory
pressure events for container 3a2b1c72-96aa-469f-8c3e-c63b55b0375c
I1201 18:07:51.583400 18887 mem.cpp:725] Started listening on critical memory
pressure events for container 3a2b1c72-96aa-469f-8c3e-c63b55b0375c
I1201 18:07:51.585841 18887 mem.cpp:356] Updated 'memory.soft_limit_in_bytes'
to 288MB for container 3a2b1c72-96aa-469f-8c3e-c63b55b0375c
I1201 18:07:51.590293 18887 mem.cpp:391] Updated 'memory.limit_in_bytes' to
288MB for container 3a2b1c72-96aa-469f-8c3e-c63b55b0375c
I1201 18:07:51.597565 18885 linux_launcher.cpp:365] Cloning child process with
flags =
I1201 18:07:51.603770 18885 linux_launcher.cpp:422] Assigned child process
'18903' to 'mesos_executors.slice'
I1201 18:07:51.700537 18903 exec.cpp:136] Version: 0.26.0
I1201 18:07:51.706681 18882 slave.cpp:2405] Got registration for executor
'8e530058-d9c1-4c6c-8837-09269dbc616a' of framework
2f17d97c-de40-491e-9706-bf83a9ffd08c-0000 from executor(1)@127.0.0.1:47143
I1201 18:07:51.709386 18936 exec.cpp:210] Executor registered on slave
2f17d97c-de40-491e-9706-bf83a9ffd08c-S0
Registered executor on centos71
I1201 18:07:51.711493 18883 mem.cpp:356] Updated 'memory.soft_limit_in_bytes'
to 288MB for container 3a2b1c72-96aa-469f-8c3e-c63b55b0375c
I1201 18:07:51.714323 18885 slave.cpp:1793] Sending queued task
'8e530058-d9c1-4c6c-8837-09269dbc616a' to executor
'8e530058-d9c1-4c6c-8837-09269dbc616a' of framework
2f17d97c-de40-491e-9706-bf83a9ffd08c-0000 at executor(1)@127.0.0.1:47143
Starting task 8e530058-d9c1-4c6c-8837-09269dbc616a
Forked command at 18939
sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done'
I1201 18:07:51.719348 18886 slave.cpp:2762] Handling status update TASK_RUNNING
(UUID: 6313338a-3f6c-491e-a410-a5fa9a747574) for task
8e530058-d9c1-4c6c-8837-09269dbc616a of framework
2f17d97c-de40-491e-9706-bf83a9ffd08c-0000 from executor(1)@127.0.0.1:47143
I1201 18:07:51.719849 18883 status_update_manager.cpp:322] Received status
update TASK_RUNNING (UUID: 6313338a-3f6c-491e-a410-a5fa9a747574) for task
8e530058-d9c1-4c6c-8837-09269dbc616a of framework
2f17d97c-de40-491e-9706-bf83a9ffd08c-0000
I1201 18:07:51.720561 18886 slave.cpp:3087] Forwarding the update TASK_RUNNING
(UUID: 6313338a-3f6c-491e-a410-a5fa9a747574) for task
8e530058-d9c1-4c6c-8837-09269dbc616a of framework
2f17d97c-de40-491e-9706-bf83a9ffd08c-0000 to [email protected]:57652
I1201 18:07:51.720746 18886 slave.cpp:3011] Sending acknowledgement for status
update TASK_RUNNING (UUID: 6313338a-3f6c-491e-a410-a5fa9a747574) for task
8e530058-d9c1-4c6c-8837-09269dbc616a of framework
2f17d97c-de40-491e-9706-bf83a9ffd08c-0000 to executor(1)@127.0.0.1:47143
I1201 18:07:51.722455 18883 master.cpp:4414] Status update TASK_RUNNING (UUID:
6313338a-3f6c-491e-a410-a5fa9a747574) for task
8e530058-d9c1-4c6c-8837-09269dbc616a of framework
2f17d97c-de40-491e-9706-bf83a9ffd08c-0000 from slave
2f17d97c-de40-491e-9706-bf83a9ffd08c-S0 at slave(1)@127.0.0.1:57652 (centos71)
I1201 18:07:51.722486 18883 master.cpp:4462] Forwarding status update
TASK_RUNNING (UUID: 6313338a-3f6c-491e-a410-a5fa9a747574) for task
8e530058-d9c1-4c6c-8837-09269dbc616a of framework
2f17d97c-de40-491e-9706-bf83a9ffd08c-0000
I1201 18:07:51.723788 18883 master.cpp:6066] Updating the state of task
8e530058-d9c1-4c6c-8837-09269dbc616a of framework
2f17d97c-de40-491e-9706-bf83a9ffd08c-0000 (latest state: TASK_RUNNING, status
update state: TASK_RUNNING)
I1201 18:07:51.724521 18887 master.cpp:3571] Processing ACKNOWLEDGE call
6313338a-3f6c-491e-a410-a5fa9a747574 for task
8e530058-d9c1-4c6c-8837-09269dbc616a of framework
2f17d97c-de40-491e-9706-bf83a9ffd08c-0000 (default) at
[email protected]:57652 on slave
2f17d97c-de40-491e-9706-bf83a9ffd08c-S0
I1201 18:07:51.725114 18886 status_update_manager.cpp:394] Received status
update acknowledgement (UUID: 6313338a-3f6c-491e-a410-a5fa9a747574) for task
8e530058-d9c1-4c6c-8837-09269dbc616a of framework
2f17d97c-de40-491e-9706-bf83a9ffd08c-0000
../../src/tests/containerizer/memory_pressure_tests.cpp:143: Failure
Expected: (usage.get().mem_low_pressure_counter()) >=
(usage.get().mem_medium_pressure_counter()), actual: 1 vs 6
*** Aborted at 1448993271 (unix time) try "date -d @1448993271" if you are
using GNU date ***
PC: @ 0x1405704 testing::UnitTest::AddTestPartResult()
*** SIGSEGV (@0x0) received by PID 18866 (TID 0x7f73d529a8c0) from PID 0; stack
trace: ***
@ 0x7f73cddc6130 (unknown)
@ 0x1405704 testing::UnitTest::AddTestPartResult()
@ 0x13fa27b testing::internal::AssertHelper::operator=()
@ 0x13d9a23
mesos::internal::tests::MemoryPressureMesosTest_CGROUPS_ROOT_Statistics_Test::TestBody()
@ 0x1423156
testing::internal::HandleSehExceptionsInMethodIfSupported<>()
@ 0x141df60
testing::internal::HandleExceptionsInMethodIfSupported<>()
@ 0x13ff539 testing::Test::Run()
@ 0x13ffcbc testing::TestInfo::Run()
@ 0x1400302 testing::TestCase::Run()
@ 0x1406bdc testing::internal::UnitTestImpl::RunAllTests()
@ 0x1423d7b
testing::internal::HandleSehExceptionsInMethodIfSupported<>()
@ 0x141eb0c
testing::internal::HandleExceptionsInMethodIfSupported<>()
@ 0x1405922 testing::UnitTest::Run()
@ 0xd08e8c RUN_ALL_TESTS()
@ 0xd08a6a main
@ 0x7f73cc971af5 __libc_start_main
@ 0x90d909 (unknown)
I1201 18:07:52.039213 18934 exec.cpp:465] Slave exited ... shutting down
Shutting down
Sending SIGTERM to process tree at pid 18939
[vagrant@centos71 build-ssl]$ Killing the following process trees:
[
-+- 18939 sh -c while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done
\--- 18940 dd count=512 bs=1M if=/dev/zero of=./temp
]
Command terminated with signal Terminated (pid: 18939)
E1201 18:07:52.152863 18935 process.cpp:1911] Failed to shutdown socket with fd
9: Transport endpoint is not connected
{noformat}
> Installing Mesos 0.24.0 on multiple systems. Failed test on
> MemoryPressureMesosTest.CGROUPS_ROOT_Statistics
> -----------------------------------------------------------------------------------------------------------
>
> Key: MESOS-3586
> URL: https://issues.apache.org/jira/browse/MESOS-3586
> Project: Mesos
> Issue Type: Bug
> Affects Versions: 0.24.0
> Environment: Ubuntu 14.04, 3.13.0-32 generic
> Reporter: Miguel Bernadin
>
> I am install Mesos 0.24.0 on 4 servers which have very similar hardware and
> software configurations.
> After performing ../configure, make, and make check some servers have
> completed successfully and other failed on test [ RUN ]
> MemoryPressureMesosTest.CGROUPS_ROOT_Statistics.
> Is there something I should check in this test?
> PERFORMED MAKE CHECK NODE-001
> [ RUN ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics
> I1005 14:37:35.585067 38479 exec.cpp:133] Version: 0.24.0
> I1005 14:37:35.593789 38497 exec.cpp:207] Executor registered on slave
> 20151005-143735-2393768202-35106-27900-S0
> Registered executor on svdidac038.techlabs.accenture.com
> Starting task 010b2fe9-4eac-4136-8a8a-6ce7665488b0
> Forked command at 38510
> sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done'
> PERFORMED MAKE CHECK NODE-002
> [ RUN ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics
> I1005 14:38:58.794112 36997 exec.cpp:133] Version: 0.24.0
> I1005 14:38:58.802851 37022 exec.cpp:207] Executor registered on slave
> 20151005-143857-2360213770-50427-26325-S0
> Registered executor on svdidac039.techlabs.accenture.com
> Starting task 9bb317ba-41cb-44a4-b507-d1c85ceabc28
> sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done'
> Forked command at 37028
> ../../src/tests/containerizer/memory_pressure_tests.cpp:145: Failure
> Expected: (usage.get().mem_medium_pressure_counter()) >=
> (usage.get().mem_critical_pressure_counter()), actual: 5 vs 6
> 2015-10-05
> 14:39:00,130:26325(0x2af08cc78700):ZOO_ERROR@handle_socket_error_msg@1697:
> Socket [127.0.0.1:37198] zk retcode=-4, errno=111(Connection refused): server
> refused to accept the client
> [ FAILED ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics (4303 ms)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)