[ 
https://issues.apache.org/jira/browse/MESOS-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15034263#comment-15034263
 ] 

Jan Schlicht commented on MESOS-3586:
-------------------------------------

I have to reopen this, as I've found the same behavior using the 0.26-rc2 on 
CentOS 7.1. Noticed some flakiness while running {{sudo ./bin/mesos-tests.sh}} 
and could reproduce it by running {{sudo ./bin/mesos-tests.sh - 
--gtest_filter="MemoryPressureMesosTest.CGROUPS_ROOT_Statistics" 
--gtest_repeat=-1 --gtest_break_on_failure}} until it breaks.

Here's a verbose output of a failing test:
{noformat}
[ RUN      ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics
I1201 18:07:51.136508 18883 cgroups.cpp:2429] Freezing cgroup 
/sys/fs/cgroup/freezer/mesos_test_7bcd6aa5-6f35-44ea-90a5-e7f047edbffb/d540e60d-2d62-4a1e-b5ff-482f7b3cc1a5
I1201 18:07:51.144594 18886 cgroups.cpp:1411] Successfully froze cgroup 
/sys/fs/cgroup/freezer/mesos_test_7bcd6aa5-6f35-44ea-90a5-e7f047edbffb/d540e60d-2d62-4a1e-b5ff-482f7b3cc1a5
 after 7.076864ms
I1201 18:07:51.151480 18882 cgroups.cpp:2447] Thawing cgroup 
/sys/fs/cgroup/freezer/mesos_test_7bcd6aa5-6f35-44ea-90a5-e7f047edbffb/d540e60d-2d62-4a1e-b5ff-482f7b3cc1a5
I1201 18:07:51.162557 18886 cgroups.cpp:1440] Successfullly thawed cgroup 
/sys/fs/cgroup/freezer/mesos_test_7bcd6aa5-6f35-44ea-90a5-e7f047edbffb/d540e60d-2d62-4a1e-b5ff-482f7b3cc1a5
 after 11.026944ms
I1201 18:07:51.172379 18887 cgroups.cpp:2429] Freezing cgroup 
/sys/fs/cgroup/freezer/mesos_test_7bcd6aa5-6f35-44ea-90a5-e7f047edbffb
I1201 18:07:51.183791 18881 cgroups.cpp:1411] Successfully froze cgroup 
/sys/fs/cgroup/freezer/mesos_test_7bcd6aa5-6f35-44ea-90a5-e7f047edbffb after 
7.8272ms
I1201 18:07:51.192354 18887 cgroups.cpp:2447] Thawing cgroup 
/sys/fs/cgroup/freezer/mesos_test_7bcd6aa5-6f35-44ea-90a5-e7f047edbffb
I1201 18:07:51.199439 18885 cgroups.cpp:1440] Successfullly thawed cgroup 
/sys/fs/cgroup/freezer/mesos_test_7bcd6aa5-6f35-44ea-90a5-e7f047edbffb after 
7.028224ms
I1201 18:07:51.332849 18866 leveldb.cpp:176] Opened db in 6.74674ms
I1201 18:07:51.335450 18866 leveldb.cpp:183] Compacted db in 2.554513ms
I1201 18:07:51.335539 18866 leveldb.cpp:198] Created db iterator in 53851ns
I1201 18:07:51.335556 18866 leveldb.cpp:204] Seeked to beginning of db in 3455ns
I1201 18:07:51.335561 18866 leveldb.cpp:273] Iterated through 0 keys in the db 
in 107ns
I1201 18:07:51.335666 18866 replica.cpp:780] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I1201 18:07:51.337374 18881 recover.cpp:449] Starting replica recovery
I1201 18:07:51.338235 18881 recover.cpp:475] Replica is in EMPTY status
I1201 18:07:51.340142 18880 replica.cpp:676] Replica in EMPTY status received a 
broadcasted recover request from (14)@127.0.0.1:57652
I1201 18:07:51.340749 18882 recover.cpp:195] Received a recover response from a 
replica in EMPTY status
I1201 18:07:51.340975 18885 master.cpp:367] Master 
2f17d97c-de40-491e-9706-bf83a9ffd08c (centos71) started on 127.0.0.1:57652
I1201 18:07:51.341475 18884 recover.cpp:566] Updating replica status to STARTING
I1201 18:07:51.341152 18885 master.cpp:369] Flags at startup: --acls="" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
--authorizers="local" --credentials="/tmp/ap4rPt/credentials" 
--framework_sorter="drf" --help="false" --hostname_lookup="true" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
--quiet="false" --recovery_slave_removal_limit="100%" 
--registry="replicated_log" --registry_fetch_timeout="1mins" 
--registry_store_timeout="25secs" --registry_strict="true" 
--root_submissions="true" --slave_ping_timeout="15secs" 
--slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
--webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/ap4rPt/master" 
--zk_session_timeout="10secs"
W1201 18:07:51.341752 18885 master.cpp:372]
**************************************************
Master bound to loopback interface! Cannot communicate with remote schedulers 
or slaves. You might want to set '--ip' flag to a routable IP address.
**************************************************
I1201 18:07:51.341794 18885 master.cpp:414] Master only allowing authenticated 
frameworks to register
I1201 18:07:51.341804 18885 master.cpp:419] Master only allowing authenticated 
slaves to register
I1201 18:07:51.341879 18885 credentials.hpp:37] Loading credentials for 
authentication from '/tmp/ap4rPt/credentials'
I1201 18:07:51.345211 18885 master.cpp:458] Using default 'crammd5' 
authenticator
I1201 18:07:51.345268 18882 leveldb.cpp:306] Persisting metadata (8 bytes) to 
leveldb took 3.5302ms
I1201 18:07:51.345289 18882 replica.cpp:323] Persisted replica status to 
STARTING
I1201 18:07:51.345350 18885 authenticator.cpp:520] Initializing server SASL
I1201 18:07:51.345512 18882 recover.cpp:475] Replica is in STARTING status
I1201 18:07:51.346271 18885 master.cpp:495] Authorization enabled
I1201 18:07:51.346827 18886 replica.cpp:676] Replica in STARTING status 
received a broadcasted recover request from (15)@127.0.0.1:57652
I1201 18:07:51.347117 18882 recover.cpp:195] Received a recover response from a 
replica in STARTING status
I1201 18:07:51.347568 18886 recover.cpp:566] Updating replica status to VOTING
I1201 18:07:51.349238 18886 leveldb.cpp:306] Persisting metadata (8 bytes) to 
leveldb took 1.409364ms
I1201 18:07:51.349272 18886 replica.cpp:323] Persisted replica status to VOTING
I1201 18:07:51.349385 18886 recover.cpp:580] Successfully joined the Paxos group
I1201 18:07:51.349553 18886 recover.cpp:464] Recover process terminated
I1201 18:07:51.351759 18880 master.cpp:1606] The newly elected leader is 
[email protected]:57652 with id 2f17d97c-de40-491e-9706-bf83a9ffd08c
I1201 18:07:51.351795 18880 master.cpp:1619] Elected as the leading master!
I1201 18:07:51.351820 18880 master.cpp:1379] Recovering from registrar
I1201 18:07:51.352057 18885 registrar.cpp:309] Recovering registrar
I1201 18:07:51.353216 18887 log.cpp:661] Attempting to start the writer
I1201 18:07:51.355137 18885 replica.cpp:496] Replica received implicit promise 
request from (16)@127.0.0.1:57652 with proposal 1
I1201 18:07:51.357552 18885 leveldb.cpp:306] Persisting metadata (8 bytes) to 
leveldb took 2.373033ms
I1201 18:07:51.357578 18885 replica.cpp:345] Persisted promised to 1
I1201 18:07:51.358737 18881 coordinator.cpp:240] Coordinator attempting to fill 
missing positions
I1201 18:07:51.360587 18887 replica.cpp:391] Replica received explicit promise 
request from (17)@127.0.0.1:57652 for position 0 with proposal 2
I1201 18:07:51.361548 18887 leveldb.cpp:343] Persisting action (8 bytes) to 
leveldb took 918190ns
I1201 18:07:51.361569 18887 replica.cpp:715] Persisted action at 0
I1201 18:07:51.363291 18882 replica.cpp:540] Replica received write request for 
position 0 from (18)@127.0.0.1:57652
I1201 18:07:51.363356 18882 leveldb.cpp:438] Reading position from leveldb took 
40274ns
I1201 18:07:51.364213 18882 leveldb.cpp:343] Persisting action (14 bytes) to 
leveldb took 804446ns
I1201 18:07:51.364233 18882 replica.cpp:715] Persisted action at 0
I1201 18:07:51.365104 18884 replica.cpp:694] Replica received learned notice 
for position 0 from @0.0.0.0:0
I1201 18:07:51.366173 18884 leveldb.cpp:343] Persisting action (16 bytes) to 
leveldb took 1.043935ms
I1201 18:07:51.366197 18884 replica.cpp:715] Persisted action at 0
I1201 18:07:51.366211 18884 replica.cpp:700] Replica learned NOP action at 
position 0
I1201 18:07:51.367842 18887 log.cpp:677] Writer started with ending position 0
I1201 18:07:51.369870 18884 leveldb.cpp:438] Reading position from leveldb took 
49993ns
I1201 18:07:51.372493 18882 registrar.cpp:342] Successfully fetched the 
registry (0B) in 20.388096ms
I1201 18:07:51.372692 18882 registrar.cpp:441] Applied 1 operations in 69005ns; 
attempting to update the 'registry'
I1201 18:07:51.376373 18880 log.cpp:685] Attempting to append 158 bytes to the 
log
I1201 18:07:51.377168 18883 coordinator.cpp:350] Coordinator attempting to 
write APPEND action at position 1
I1201 18:07:51.379091 18883 replica.cpp:540] Replica received write request for 
position 1 from (19)@127.0.0.1:57652
I1201 18:07:51.380544 18883 leveldb.cpp:343] Persisting action (177 bytes) to 
leveldb took 1.418125ms
I1201 18:07:51.380570 18883 replica.cpp:715] Persisted action at 1
I1201 18:07:51.382406 18885 replica.cpp:694] Replica received learned notice 
for position 1 from @0.0.0.0:0
I1201 18:07:51.382995 18885 leveldb.cpp:343] Persisting action (179 bytes) to 
leveldb took 563416ns
I1201 18:07:51.383013 18885 replica.cpp:715] Persisted action at 1
I1201 18:07:51.383025 18885 replica.cpp:700] Replica learned APPEND action at 
position 1
I1201 18:07:51.387128 18885 registrar.cpp:486] Successfully updated the 
'registry' in 14.34496ms
I1201 18:07:51.387258 18885 registrar.cpp:372] Successfully recovered registrar
I1201 18:07:51.387377 18882 log.cpp:704] Attempting to truncate the log to 1
I1201 18:07:51.387805 18886 coordinator.cpp:350] Coordinator attempting to 
write TRUNCATE action at position 2
I1201 18:07:51.387814 18884 master.cpp:1416] Recovered 0 slaves from the 
Registry (120B) ; allowing 10mins for slaves to re-register
I1201 18:07:51.389592 18885 replica.cpp:540] Replica received write request for 
position 2 from (20)@127.0.0.1:57652
I1201 18:07:51.390261 18885 leveldb.cpp:343] Persisting action (16 bytes) to 
leveldb took 642820ns
I1201 18:07:51.390278 18885 replica.cpp:715] Persisted action at 2
I1201 18:07:51.392351 18882 replica.cpp:694] Replica received learned notice 
for position 2 from @0.0.0.0:0
I1201 18:07:51.393007 18882 leveldb.cpp:343] Persisting action (18 bytes) to 
leveldb took 614947ns
I1201 18:07:51.393052 18882 leveldb.cpp:401] Deleting ~1 keys from leveldb took 
27109ns
I1201 18:07:51.393064 18882 replica.cpp:715] Persisted action at 2
I1201 18:07:51.393076 18882 replica.cpp:700] Replica learned TRUNCATE action at 
position 2
I1201 18:07:51.402930 18866 containerizer.cpp:142] Using isolation: 
cgroups/mem,filesystem/posix
I1201 18:07:51.498821 18866 linux_launcher.cpp:103] Using 
/sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher
I1201 18:07:51.503211 18866 systemd.cpp:128] systemd version `208` detected
W1201 18:07:51.503242 18866 systemd.cpp:136] Required functionality `Delegate` 
was introduced in Version `218`. Your system may not function properly; however 
since some distributions have patched systemd packages, your system may still 
be functional. This is why we keep running. See MESOS-3352 for more information
I1201 18:07:51.508138 18866 systemd.cpp:210] Started systemd slice 
`mesos_executors.slice`
I1201 18:07:51.514886 18881 slave.cpp:191] Slave started on 1)@127.0.0.1:57652
I1201 18:07:51.514910 18881 slave.cpp:192] Flags at startup: 
--appc_store_dir="/tmp/mesos/store/appc" --authenticatee="crammd5" 
--cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" 
--cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" 
--cgroups_root="mesos_test_c677185f-73c5-4af9-9029-007647d301f9" 
--container_disk_watch_interval="15secs" --containerizers="mesos" 
--credential="/tmp/MemoryPressureMesosTest_CGROUPS_ROOT_Statistics_rUspgL/credential"
 --default_role="*" --disk_watch_interval="1mins" --docker="docker" 
--docker_auth_server="auth.docker.io" --docker_auth_server_port="443" 
--docker_kill_orphans="true" 
--docker_local_archives_dir="/tmp/mesos/images/docker" --docker_puller="local" 
--docker_puller_timeout="60" --docker_registry="registry-1.docker.io" 
--docker_registry_port="443" --docker_remove_delay="6hrs" 
--docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" 
--docker_store_dir="/tmp/mesos/store/docker" 
--enforce_container_disk_quota="false" --executor_registration_timeout="1mins" 
--executor_shutdown_grace_period="5secs" 
--fetcher_cache_dir="/tmp/MemoryPressureMesosTest_CGROUPS_ROOT_Statistics_rUspgL/fetch"
 --fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" 
--gc_disk_headroom="0.1" --hadoop_home="" --help="false" 
--hostname_lookup="true" --image_provisioner_backend="copy" 
--initialize_driver_logging="true" --isolation="cgroups/mem" 
--launcher_dir="/home/vagrant/mesos/build-ssl/src" --logbufsecs="0" 
--logging_level="INFO" --oversubscribed_resources_interval="15secs" 
--perf_duration="10secs" --perf_interval="1mins" 
--qos_correction_interval_min="0ns" --quiet="false" --recover="reconnect" 
--recovery_timeout="15mins" --registration_backoff_factor="10ms" 
--resources="cpus:2;mem:1024;disk:1024;ports:[31000-32000]" 
--revocable_cpu_low_priority="true" --sandbox_directory="/mnt/mesos/sandbox" 
--strict="true" --switch_user="true" 
--systemd_runtime_directory="/run/systemd/system" --version="false" 
--work_dir="/tmp/MemoryPressureMesosTest_CGROUPS_ROOT_Statistics_rUspgL"
W1201 18:07:51.515254 18881 slave.cpp:195]
**************************************************
Slave bound to loopback interface! Cannot communicate with remote master(s). 
You might want to set '--ip' flag to a routable IP address.
**************************************************
I1201 18:07:51.515266 18881 credentials.hpp:85] Loading credential for 
authentication from 
'/tmp/MemoryPressureMesosTest_CGROUPS_ROOT_Statistics_rUspgL/credential'
I1201 18:07:51.515554 18881 slave.cpp:322] Slave using credential for: 
test-principal
W1201 18:07:51.516015 18866 sched.cpp:1542]
**************************************************
Scheduler driver bound to loopback interface! Cannot communicate with remote 
master(s). You might want to set 'LIBPROCESS_IP' environment variable to use a 
routable IP address.
**************************************************
I1201 18:07:51.518287 18881 slave.cpp:392] Slave resources: cpus(*):2; 
mem(*):1024; disk(*):1024; ports(*):[31000-32000]
I1201 18:07:51.518360 18881 slave.cpp:400] Slave attributes: [  ]
I1201 18:07:51.518374 18881 slave.cpp:405] Slave hostname: centos71
I1201 18:07:51.518379 18881 slave.cpp:410] Slave checkpoint: true
I1201 18:07:51.518565 18866 sched.cpp:166] Version: 0.26.0
I1201 18:07:51.519347 18883 sched.cpp:264] New master detected at 
[email protected]:57652
I1201 18:07:51.519405 18883 sched.cpp:320] Authenticating with master 
[email protected]:57652
I1201 18:07:51.519417 18883 sched.cpp:327] Using default CRAM-MD5 authenticatee
I1201 18:07:51.519732 18886 authenticatee.cpp:99] Initializing client SASL
I1201 18:07:51.519870 18886 authenticatee.cpp:123] Creating new client SASL 
connection
I1201 18:07:51.520319 18883 state.cpp:54] Recovering state from 
'/tmp/MemoryPressureMesosTest_CGROUPS_ROOT_Statistics_rUspgL/meta'
I1201 18:07:51.520938 18884 master.cpp:5150] Authenticating 
[email protected]:57652
I1201 18:07:51.521203 18887 status_update_manager.cpp:202] Recovering status 
update manager
I1201 18:07:51.521390 18884 containerizer.cpp:384] Recovering containerizer
I1201 18:07:51.521517 18886 authenticator.cpp:100] Creating new server SASL 
connection
I1201 18:07:51.522209 18886 authenticatee.cpp:214] Received SASL authentication 
mechanisms: CRAM-MD5
I1201 18:07:51.522238 18886 authenticatee.cpp:240] Attempting to authenticate 
with mechanism 'CRAM-MD5'
I1201 18:07:51.522457 18883 authenticator.cpp:205] Received SASL authentication 
start
I1201 18:07:51.522548 18883 authenticator.cpp:327] Authentication requires more 
steps
I1201 18:07:51.522692 18886 authenticatee.cpp:260] Received SASL authentication 
step
I1201 18:07:51.522801 18886 authenticator.cpp:233] Received SASL authentication 
step
I1201 18:07:51.522953 18886 authenticator.cpp:319] Authentication success
I1201 18:07:51.523102 18885 authenticatee.cpp:300] Authentication success
I1201 18:07:51.523355 18887 master.cpp:5180] Successfully authenticated 
principal 'test-principal' at 
[email protected]:57652
I1201 18:07:51.524238 18880 sched.cpp:409] Successfully authenticated with 
master [email protected]:57652
I1201 18:07:51.524529 18881 master.cpp:2176] Received SUBSCRIBE call for 
framework 'default' at 
[email protected]:57652
I1201 18:07:51.526083 18881 master.cpp:1645] Authorizing framework principal 
'test-principal' to receive offers for role '*'
I1201 18:07:51.526430 18882 master.cpp:2247] Subscribing framework default with 
checkpointing disabled and capabilities [  ]
I1201 18:07:51.527278 18881 hierarchical.cpp:195] Added framework 
2f17d97c-de40-491e-9706-bf83a9ffd08c-0000
I1201 18:07:51.527310 18882 sched.cpp:643] Framework registered with 
2f17d97c-de40-491e-9706-bf83a9ffd08c-0000
I1201 18:07:51.532703 18887 slave.cpp:4230] Finished recovery
I1201 18:07:51.533692 18887 slave.cpp:729] New master detected at 
[email protected]:57652
I1201 18:07:51.533741 18880 status_update_manager.cpp:176] Pausing sending 
status updates
I1201 18:07:51.533759 18887 slave.cpp:792] Authenticating with master 
[email protected]:57652
I1201 18:07:51.533769 18887 slave.cpp:797] Using default CRAM-MD5 authenticatee
I1201 18:07:51.533937 18887 slave.cpp:765] Detecting new master
I1201 18:07:51.534019 18884 authenticatee.cpp:123] Creating new client SASL 
connection
I1201 18:07:51.535223 18885 master.cpp:5150] Authenticating 
slave(1)@127.0.0.1:57652
I1201 18:07:51.535610 18885 authenticator.cpp:100] Creating new server SASL 
connection
I1201 18:07:51.536231 18887 authenticatee.cpp:214] Received SASL authentication 
mechanisms: CRAM-MD5
I1201 18:07:51.536253 18887 authenticatee.cpp:240] Attempting to authenticate 
with mechanism 'CRAM-MD5'
I1201 18:07:51.536309 18887 authenticator.cpp:205] Received SASL authentication 
start
I1201 18:07:51.536342 18887 authenticator.cpp:327] Authentication requires more 
steps
I1201 18:07:51.536396 18887 authenticatee.cpp:260] Received SASL authentication 
step
I1201 18:07:51.536480 18887 authenticator.cpp:233] Received SASL authentication 
step
I1201 18:07:51.536531 18887 authenticator.cpp:319] Authentication success
I1201 18:07:51.536628 18885 authenticatee.cpp:300] Authentication success
I1201 18:07:51.536666 18884 master.cpp:5180] Successfully authenticated 
principal 'test-principal' at slave(1)@127.0.0.1:57652
I1201 18:07:51.538077 18885 slave.cpp:860] Successfully authenticated with 
master [email protected]:57652
I1201 18:07:51.538951 18882 master.cpp:3859] Registering slave at 
slave(1)@127.0.0.1:57652 (centos71) with id 
2f17d97c-de40-491e-9706-bf83a9ffd08c-S0
I1201 18:07:51.539645 18881 registrar.cpp:441] Applied 1 operations in 53988ns; 
attempting to update the 'registry'
I1201 18:07:51.540289 18881 log.cpp:685] Attempting to append 324 bytes to the 
log
I1201 18:07:51.540432 18887 coordinator.cpp:350] Coordinator attempting to 
write APPEND action at position 3
I1201 18:07:51.542268 18886 replica.cpp:540] Replica received write request for 
position 3 from (38)@127.0.0.1:57652
I1201 18:07:51.545014 18886 leveldb.cpp:343] Persisting action (343 bytes) to 
leveldb took 2.706483ms
I1201 18:07:51.545053 18886 replica.cpp:715] Persisted action at 3
I1201 18:07:51.546170 18881 replica.cpp:694] Replica received learned notice 
for position 3 from @0.0.0.0:0
I1201 18:07:51.547289 18881 leveldb.cpp:343] Persisting action (345 bytes) to 
leveldb took 1.061009ms
I1201 18:07:51.547319 18881 replica.cpp:715] Persisted action at 3
I1201 18:07:51.547333 18881 replica.cpp:700] Replica learned APPEND action at 
position 3
I1201 18:07:51.548413 18881 registrar.cpp:486] Successfully updated the 
'registry' in 8.70016ms
I1201 18:07:51.548601 18881 log.cpp:704] Attempting to truncate the log to 3
I1201 18:07:51.549383 18883 coordinator.cpp:350] Coordinator attempting to 
write TRUNCATE action at position 4
I1201 18:07:51.549446 18881 master.cpp:3927] Registered slave 
2f17d97c-de40-491e-9706-bf83a9ffd08c-S0 at slave(1)@127.0.0.1:57652 (centos71) 
with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000]
I1201 18:07:51.549605 18886 hierarchical.cpp:344] Added slave 
2f17d97c-de40-491e-9706-bf83a9ffd08c-S0 (centos71) with cpus(*):2; mem(*):1024; 
disk(*):1024; ports(*):[31000-32000] (allocated: )
I1201 18:07:51.549808 18882 slave.cpp:904] Registered with master 
[email protected]:57652; given slave ID 2f17d97c-de40-491e-9706-bf83a9ffd08c-S0
I1201 18:07:51.550792 18880 status_update_manager.cpp:183] Resuming sending 
status updates
I1201 18:07:51.551144 18882 slave.cpp:963] Forwarding total oversubscribed 
resources
I1201 18:07:51.551143 18885 replica.cpp:540] Replica received write request for 
position 4 from (39)@127.0.0.1:57652
I1201 18:07:51.551512 18881 master.cpp:4979] Sending 1 offers to framework 
2f17d97c-de40-491e-9706-bf83a9ffd08c-0000 (default) at 
[email protected]:57652
I1201 18:07:51.551702 18881 master.cpp:4269] Received update of slave 
2f17d97c-de40-491e-9706-bf83a9ffd08c-S0 at slave(1)@127.0.0.1:57652 (centos71) 
with total oversubscribed resources
I1201 18:07:51.551928 18881 hierarchical.cpp:400] Slave 
2f17d97c-de40-491e-9706-bf83a9ffd08c-S0 (centos71) updated with oversubscribed 
resources  (total: cpus(*):2; mem(*):1024; disk(*):1024; 
ports(*):[31000-32000], allocated: cpus(*):2; mem(*):1024; disk(*):1024; 
ports(*):[31000-32000])
I1201 18:07:51.552104 18885 leveldb.cpp:343] Persisting action (16 bytes) to 
leveldb took 903203ns
I1201 18:07:51.552137 18885 replica.cpp:715] Persisted action at 4
I1201 18:07:51.552709 18886 replica.cpp:694] Replica received learned notice 
for position 4 from @0.0.0.0:0
I1201 18:07:51.556046 18886 leveldb.cpp:343] Persisting action (18 bytes) to 
leveldb took 3.315992ms
I1201 18:07:51.556107 18886 leveldb.cpp:401] Deleting ~2 keys from leveldb took 
36657ns
I1201 18:07:51.556152 18886 replica.cpp:715] Persisted action at 4
I1201 18:07:51.556174 18886 replica.cpp:700] Replica learned TRUNCATE action at 
position 4
I1201 18:07:51.556725 18880 master.cpp:2915] Processing ACCEPT call for offers: 
[ 2f17d97c-de40-491e-9706-bf83a9ffd08c-O0 ] on slave 
2f17d97c-de40-491e-9706-bf83a9ffd08c-S0 at slave(1)@127.0.0.1:57652 (centos71) 
for framework 2f17d97c-de40-491e-9706-bf83a9ffd08c-0000 (default) at 
[email protected]:57652
I1201 18:07:51.556794 18880 master.cpp:2711] Authorizing framework principal 
'test-principal' to launch task 8e530058-d9c1-4c6c-8837-09269dbc616a as user 
'root'
I1201 18:07:51.558148 18882 master.hpp:176] Adding task 
8e530058-d9c1-4c6c-8837-09269dbc616a with resources cpus(*):1; mem(*):256; 
disk(*):1024 on slave 2f17d97c-de40-491e-9706-bf83a9ffd08c-S0 (centos71)
I1201 18:07:51.558205 18882 master.cpp:3245] Launching task 
8e530058-d9c1-4c6c-8837-09269dbc616a of framework 
2f17d97c-de40-491e-9706-bf83a9ffd08c-0000 (default) at 
[email protected]:57652 with resources 
cpus(*):1; mem(*):256; disk(*):1024 on slave 
2f17d97c-de40-491e-9706-bf83a9ffd08c-S0 at slave(1)@127.0.0.1:57652 (centos71)
I1201 18:07:51.558461 18886 slave.cpp:1294] Got assigned task 
8e530058-d9c1-4c6c-8837-09269dbc616a for framework 
2f17d97c-de40-491e-9706-bf83a9ffd08c-0000
I1201 18:07:51.558547 18885 hierarchical.cpp:744] Recovered cpus(*):1; 
mem(*):768; ports(*):[31000-32000] (total: cpus(*):2; mem(*):1024; 
disk(*):1024; ports(*):[31000-32000], allocated: cpus(*):1; mem(*):256; 
disk(*):1024) on slave 2f17d97c-de40-491e-9706-bf83a9ffd08c-S0 from framework 
2f17d97c-de40-491e-9706-bf83a9ffd08c-0000
I1201 18:07:51.559119 18886 slave.cpp:1410] Launching task 
8e530058-d9c1-4c6c-8837-09269dbc616a for framework 
2f17d97c-de40-491e-9706-bf83a9ffd08c-0000
I1201 18:07:51.560716 18886 paths.cpp:436] Trying to chown 
'/tmp/MemoryPressureMesosTest_CGROUPS_ROOT_Statistics_rUspgL/slaves/2f17d97c-de40-491e-9706-bf83a9ffd08c-S0/frameworks/2f17d97c-de40-491e-9706-bf83a9ffd08c-0000/executors/8e530058-d9c1-4c6c-8837-09269dbc616a/runs/3a2b1c72-96aa-469f-8c3e-c63b55b0375c'
 to user 'root'
I1201 18:07:51.563762 18886 slave.cpp:4999] Launching executor 
8e530058-d9c1-4c6c-8837-09269dbc616a of framework 
2f17d97c-de40-491e-9706-bf83a9ffd08c-0000 with resources cpus(*):0.1; mem(*):32 
in work directory 
'/tmp/MemoryPressureMesosTest_CGROUPS_ROOT_Statistics_rUspgL/slaves/2f17d97c-de40-491e-9706-bf83a9ffd08c-S0/frameworks/2f17d97c-de40-491e-9706-bf83a9ffd08c-0000/executors/8e530058-d9c1-4c6c-8837-09269dbc616a/runs/3a2b1c72-96aa-469f-8c3e-c63b55b0375c'
I1201 18:07:51.564254 18881 containerizer.cpp:618] Starting container 
'3a2b1c72-96aa-469f-8c3e-c63b55b0375c' for executor 
'8e530058-d9c1-4c6c-8837-09269dbc616a' of framework 
'2f17d97c-de40-491e-9706-bf83a9ffd08c-0000'
I1201 18:07:51.564388 18886 slave.cpp:1628] Queuing task 
'8e530058-d9c1-4c6c-8837-09269dbc616a' for executor 
'8e530058-d9c1-4c6c-8837-09269dbc616a' of framework 
2f17d97c-de40-491e-9706-bf83a9ffd08c-0000
I1201 18:07:51.574303 18887 mem.cpp:605] Started listening for OOM events for 
container 3a2b1c72-96aa-469f-8c3e-c63b55b0375c
I1201 18:07:51.577256 18887 mem.cpp:725] Started listening on low memory 
pressure events for container 3a2b1c72-96aa-469f-8c3e-c63b55b0375c
I1201 18:07:51.579624 18887 mem.cpp:725] Started listening on medium memory 
pressure events for container 3a2b1c72-96aa-469f-8c3e-c63b55b0375c
I1201 18:07:51.583400 18887 mem.cpp:725] Started listening on critical memory 
pressure events for container 3a2b1c72-96aa-469f-8c3e-c63b55b0375c
I1201 18:07:51.585841 18887 mem.cpp:356] Updated 'memory.soft_limit_in_bytes' 
to 288MB for container 3a2b1c72-96aa-469f-8c3e-c63b55b0375c
I1201 18:07:51.590293 18887 mem.cpp:391] Updated 'memory.limit_in_bytes' to 
288MB for container 3a2b1c72-96aa-469f-8c3e-c63b55b0375c
I1201 18:07:51.597565 18885 linux_launcher.cpp:365] Cloning child process with 
flags =
I1201 18:07:51.603770 18885 linux_launcher.cpp:422] Assigned child process 
'18903' to 'mesos_executors.slice'
I1201 18:07:51.700537 18903 exec.cpp:136] Version: 0.26.0
I1201 18:07:51.706681 18882 slave.cpp:2405] Got registration for executor 
'8e530058-d9c1-4c6c-8837-09269dbc616a' of framework 
2f17d97c-de40-491e-9706-bf83a9ffd08c-0000 from executor(1)@127.0.0.1:47143
I1201 18:07:51.709386 18936 exec.cpp:210] Executor registered on slave 
2f17d97c-de40-491e-9706-bf83a9ffd08c-S0
Registered executor on centos71
I1201 18:07:51.711493 18883 mem.cpp:356] Updated 'memory.soft_limit_in_bytes' 
to 288MB for container 3a2b1c72-96aa-469f-8c3e-c63b55b0375c
I1201 18:07:51.714323 18885 slave.cpp:1793] Sending queued task 
'8e530058-d9c1-4c6c-8837-09269dbc616a' to executor 
'8e530058-d9c1-4c6c-8837-09269dbc616a' of framework 
2f17d97c-de40-491e-9706-bf83a9ffd08c-0000 at executor(1)@127.0.0.1:47143
Starting task 8e530058-d9c1-4c6c-8837-09269dbc616a
Forked command at 18939
sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done'
I1201 18:07:51.719348 18886 slave.cpp:2762] Handling status update TASK_RUNNING 
(UUID: 6313338a-3f6c-491e-a410-a5fa9a747574) for task 
8e530058-d9c1-4c6c-8837-09269dbc616a of framework 
2f17d97c-de40-491e-9706-bf83a9ffd08c-0000 from executor(1)@127.0.0.1:47143
I1201 18:07:51.719849 18883 status_update_manager.cpp:322] Received status 
update TASK_RUNNING (UUID: 6313338a-3f6c-491e-a410-a5fa9a747574) for task 
8e530058-d9c1-4c6c-8837-09269dbc616a of framework 
2f17d97c-de40-491e-9706-bf83a9ffd08c-0000
I1201 18:07:51.720561 18886 slave.cpp:3087] Forwarding the update TASK_RUNNING 
(UUID: 6313338a-3f6c-491e-a410-a5fa9a747574) for task 
8e530058-d9c1-4c6c-8837-09269dbc616a of framework 
2f17d97c-de40-491e-9706-bf83a9ffd08c-0000 to [email protected]:57652
I1201 18:07:51.720746 18886 slave.cpp:3011] Sending acknowledgement for status 
update TASK_RUNNING (UUID: 6313338a-3f6c-491e-a410-a5fa9a747574) for task 
8e530058-d9c1-4c6c-8837-09269dbc616a of framework 
2f17d97c-de40-491e-9706-bf83a9ffd08c-0000 to executor(1)@127.0.0.1:47143
I1201 18:07:51.722455 18883 master.cpp:4414] Status update TASK_RUNNING (UUID: 
6313338a-3f6c-491e-a410-a5fa9a747574) for task 
8e530058-d9c1-4c6c-8837-09269dbc616a of framework 
2f17d97c-de40-491e-9706-bf83a9ffd08c-0000 from slave 
2f17d97c-de40-491e-9706-bf83a9ffd08c-S0 at slave(1)@127.0.0.1:57652 (centos71)
I1201 18:07:51.722486 18883 master.cpp:4462] Forwarding status update 
TASK_RUNNING (UUID: 6313338a-3f6c-491e-a410-a5fa9a747574) for task 
8e530058-d9c1-4c6c-8837-09269dbc616a of framework 
2f17d97c-de40-491e-9706-bf83a9ffd08c-0000
I1201 18:07:51.723788 18883 master.cpp:6066] Updating the state of task 
8e530058-d9c1-4c6c-8837-09269dbc616a of framework 
2f17d97c-de40-491e-9706-bf83a9ffd08c-0000 (latest state: TASK_RUNNING, status 
update state: TASK_RUNNING)
I1201 18:07:51.724521 18887 master.cpp:3571] Processing ACKNOWLEDGE call 
6313338a-3f6c-491e-a410-a5fa9a747574 for task 
8e530058-d9c1-4c6c-8837-09269dbc616a of framework 
2f17d97c-de40-491e-9706-bf83a9ffd08c-0000 (default) at 
[email protected]:57652 on slave 
2f17d97c-de40-491e-9706-bf83a9ffd08c-S0
I1201 18:07:51.725114 18886 status_update_manager.cpp:394] Received status 
update acknowledgement (UUID: 6313338a-3f6c-491e-a410-a5fa9a747574) for task 
8e530058-d9c1-4c6c-8837-09269dbc616a of framework 
2f17d97c-de40-491e-9706-bf83a9ffd08c-0000
../../src/tests/containerizer/memory_pressure_tests.cpp:143: Failure
Expected: (usage.get().mem_low_pressure_counter()) >= 
(usage.get().mem_medium_pressure_counter()), actual: 1 vs 6
*** Aborted at 1448993271 (unix time) try "date -d @1448993271" if you are 
using GNU date ***
PC: @          0x1405704 testing::UnitTest::AddTestPartResult()
*** SIGSEGV (@0x0) received by PID 18866 (TID 0x7f73d529a8c0) from PID 0; stack 
trace: ***
    @     0x7f73cddc6130 (unknown)
    @          0x1405704 testing::UnitTest::AddTestPartResult()
    @          0x13fa27b testing::internal::AssertHelper::operator=()
    @          0x13d9a23 
mesos::internal::tests::MemoryPressureMesosTest_CGROUPS_ROOT_Statistics_Test::TestBody()
    @          0x1423156 
testing::internal::HandleSehExceptionsInMethodIfSupported<>()
    @          0x141df60 
testing::internal::HandleExceptionsInMethodIfSupported<>()
    @          0x13ff539 testing::Test::Run()
    @          0x13ffcbc testing::TestInfo::Run()
    @          0x1400302 testing::TestCase::Run()
    @          0x1406bdc testing::internal::UnitTestImpl::RunAllTests()
    @          0x1423d7b 
testing::internal::HandleSehExceptionsInMethodIfSupported<>()
    @          0x141eb0c 
testing::internal::HandleExceptionsInMethodIfSupported<>()
    @          0x1405922 testing::UnitTest::Run()
    @           0xd08e8c RUN_ALL_TESTS()
    @           0xd08a6a main
    @     0x7f73cc971af5 __libc_start_main
    @           0x90d909 (unknown)
I1201 18:07:52.039213 18934 exec.cpp:465] Slave exited ... shutting down
Shutting down
Sending SIGTERM to process tree at pid 18939
[vagrant@centos71 build-ssl]$ Killing the following process trees:
[
-+- 18939 sh -c while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done
 \--- 18940 dd count=512 bs=1M if=/dev/zero of=./temp
]
Command terminated with signal Terminated (pid: 18939)
E1201 18:07:52.152863 18935 process.cpp:1911] Failed to shutdown socket with fd 
9: Transport endpoint is not connected
{noformat}

> Installing Mesos 0.24.0 on multiple systems. Failed test on 
> MemoryPressureMesosTest.CGROUPS_ROOT_Statistics
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: MESOS-3586
>                 URL: https://issues.apache.org/jira/browse/MESOS-3586
>             Project: Mesos
>          Issue Type: Bug
>    Affects Versions: 0.24.0
>         Environment: Ubuntu 14.04, 3.13.0-32 generic
>            Reporter: Miguel Bernadin
>
> I am install Mesos 0.24.0 on 4 servers which have very similar hardware and 
> software configurations. 
> After performing ../configure, make, and make check some servers have 
> completed successfully and other failed on test [ RUN      ] 
> MemoryPressureMesosTest.CGROUPS_ROOT_Statistics.
> Is there something I should check in this test? 
> PERFORMED MAKE CHECK NODE-001
> [ RUN      ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics
> I1005 14:37:35.585067 38479 exec.cpp:133] Version: 0.24.0
> I1005 14:37:35.593789 38497 exec.cpp:207] Executor registered on slave 
> 20151005-143735-2393768202-35106-27900-S0
> Registered executor on svdidac038.techlabs.accenture.com
> Starting task 010b2fe9-4eac-4136-8a8a-6ce7665488b0
> Forked command at 38510
> sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done'
> PERFORMED MAKE CHECK NODE-002
> [ RUN      ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics
> I1005 14:38:58.794112 36997 exec.cpp:133] Version: 0.24.0
> I1005 14:38:58.802851 37022 exec.cpp:207] Executor registered on slave 
> 20151005-143857-2360213770-50427-26325-S0
> Registered executor on svdidac039.techlabs.accenture.com
> Starting task 9bb317ba-41cb-44a4-b507-d1c85ceabc28
> sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done'
> Forked command at 37028
> ../../src/tests/containerizer/memory_pressure_tests.cpp:145: Failure
> Expected: (usage.get().mem_medium_pressure_counter()) >= 
> (usage.get().mem_critical_pressure_counter()), actual: 5 vs 6
> 2015-10-05 
> 14:39:00,130:26325(0x2af08cc78700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:37198] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> [  FAILED  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics (4303 ms)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to