[ 
https://issues.apache.org/jira/browse/MESOS-9528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16745888#comment-16745888
 ] 

longfei commented on MESOS-9528:
--------------------------------

[~qianzhang] Would you take a look at this pls?

> MemoryPressureMesosTest failed because of OOM.
> ----------------------------------------------
>
>                 Key: MESOS-9528
>                 URL: https://issues.apache.org/jira/browse/MESOS-9528
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: longfei
>            Priority: Major
>
> I found that MemoryPressureMesosTest.ROOT_CGROUPS_Statistics and 
> ROOT_CGROUPS_Statistics.ROOT_CGROUPS_SlaveRecovery would fail because of OOM 
> when I ran make check.
> The log is as follows:
> {code:java}
> I0118 16:01:00.918741 185574 task_status_update_manager.cpp:401] Received 
> task status update acknowledgement (UUID: 
> a0a8dc75-c016-4f4a-9c78-c042c642b7a8) for task 
> 52b5ebc8-26b0-4439-ac3c-a7bb4dc5330f of framework 
> 842f64ee-274e-4eb3-9787-ce8ee1000ffe-0000
> I0118 16:01:01.093305 185557 memory.cpp:515] OOM detected for container 
> 8353191d-5d91-4780-b096-9d9aa28b0723
> I0118 16:01:01.093466 185557 memory.cpp:555] Memory limit exceeded: 
> Requested: 288MB Maximum Used: 288MB
> MEMORY STATISTICS:
> cache 291041280
> rss 10948608
> rss_huge 0
> mapped_file 0
> writeback 0
> swap 0
> pgpgin 77127
> pgpgout 3399
> pgfault 11626
> pgmajfault 0
> inactive_anon 290988032
> active_anon 10829824
> inactive_file 0
> active_file 0
> unevictable 0
> hierarchical_memory_limit 301989888
> hierarchical_memsw_limit 18446744073709551615
> total_cache 291041280
> total_rss 10948608
> total_rss_huge 0
> total_mapped_file 0
> total_writeback 0
> total_swap 0
> total_pgpgin 77127
> total_pgpgout 3399
> total_pgfault 11626
> total_pgmajfault 0
> total_inactive_anon 290988032
> total_active_anon 10829824
> total_inactive_file 0
> total_active_file 0
> total_unevictable 0
> dd: error writing './temp': Cannot allocate memory
> 278+0 records in
> 277+0 records out
> 291041280 bytes (291 MB) copied, 0.122189 s, 2.4 GB/s
> I0118 16:01:01.093724 185584 containerizer.cpp:2995] Container 
> 8353191d-5d91-4780-b096-9d9aa28b0723 has reached its limit for resource 
> [{"name":"mem","scalar":
> {"value":288.0}
> ,"type":"SCALAR"}] and will be terminated
> I0118 16:01:01.093787 185584 containerizer.cpp:2469] Destroying container 
> 8353191d-5d91-4780-b096-9d9aa28b0723 in RUNNING state
> I0118 16:01:01.093801 185584 containerizer.cpp:3136] Transitioning the state 
> of container 8353191d-5d91-4780-b096-9d9aa28b0723 from RUNNING to DESTROYING
> I0118 16:01:01.093897 185571 linux_launcher.cpp:576] Asked to destroy 
> container 8353191d-5d91-4780-b096-9d9aa28b0723
> I0118 16:01:01.093941 185571 linux_launcher.cpp:618] Destroying cgroup 
> '/sys/fs/cgroup/freezer/mesos_test_4c169006-c6fb-4486-9a6e-5a3d0e9777e6/8353191d-5d91-4780-b096-9d9aa28b0723'
> I0118 16:01:01.094094 185559 cgroups.cpp:2854] Freezing cgroup 
> /sys/fs/cgroup/freezer/mesos_test_4c169006-c6fb-4486-9a6e-5a3d0e9777e6/8353191d-5d91-4780-b096-9d9aa28b0723
> I0118 16:01:01.094204 185578 cgroups.cpp:1242] Successfully froze cgroup 
> /sys/fs/cgroup/freezer/mesos_test_4c169006-c6fb-4486-9a6e-5a3d0e9777e6/8353191d-5d91-4780-b096-9d9aa28b0723
>  after 77056ns
> I0118 16:01:01.094406 185556 cgroups.cpp:2872] Thawing cgroup 
> /sys/fs/cgroup/freezer/mesos_test_4c169006-c6fb-4486-9a6e-5a3d0e9777e6/8353191d-5d91-4780-b096-9d9aa28b0723
> I0118 16:01:01.094564 185575 cgroups.cpp:1271] Successfully thawed cgroup 
> /sys/fs/cgroup/freezer/mesos_test_4c169006-c6fb-4486-9a6e-5a3d0e9777e6/8353191d-5d91-4780-b096-9d9aa28b0723
>  after 128us
> I0118 16:01:01.096833 185564 slave.cpp:5988] Got exited event for 
> executor(1)@10.10.23.200:18282
> I0118 16:01:01.105190 185568 containerizer.cpp:2975] Container 
> 8353191d-5d91-4780-b096-9d9aa28b0723 has exited
> I0118 16:01:01.106215 185594 slave.cpp:6384] Executor 
> '52b5ebc8-26b0-4439-ac3c-a7bb4dc5330f' of framework 
> 842f64ee-274e-4eb3-9787-ce8ee1000ffe-0000 terminated with signal Killed
> I0118 16:01:01.107043 185594 slave.cpp:5316] Handling status update 
> TASK_FAILED (Status UUID: 23265b36-faf6-4709-9f67-b9fd817c90ba) for task 
> 52b5ebc8-26b0-4439-ac3c-a7bb4dc5330f of framework 
> 842f64ee-274e-4eb3-9787-ce8ee1000ffe-0000 from @0.0.0.0:0
> E0118 16:01:01.107215 185560 slave.cpp:5647] Failed to update resources for 
> container 8353191d-5d91-4780-b096-9d9aa28b0723 of executor 
> '52b5ebc8-26b0-4439-ac3c-a7bb4dc5330f' running task 
> 52b5ebc8-26b0-4439-ac3c-a7bb4dc5330f on status update for terminal task, 
> destroying container: Container not found
> W0118 16:01:01.107260 185587 composing.cpp:609] Attempted to destroy unknown 
> container 8353191d-5d91-4780-b096-9d9aa28b0723
> I0118 16:01:01.107281 185555 task_status_update_manager.cpp:328] Received 
> task status update TASK_FAILED (Status UUID: 
> 23265b36-faf6-4709-9f67-b9fd817c90ba) for task 
> 52b5ebc8-26b0-4439-ac3c-a7bb4dc5330f of framework 
> 842f64ee-274e-4eb3-9787-ce8ee1000ffe-0000
> I0118 16:01:01.107362 185569 slave.cpp:5808] Forwarding the update 
> TASK_FAILED (Status UUID: 23265b36-faf6-4709-9f67-b9fd817c90ba) for task 
> 52b5ebc8-26b0-4439-ac3c-a7bb4dc5330f of framework 
> 842f64ee-274e-4eb3-9787-ce8ee1000ffe-0000 to [email protected]:26211
> I0118 16:01:01.107483 185563 master.cpp:8496] Status update TASK_FAILED 
> (Status UUID: 23265b36-faf6-4709-9f67-b9fd817c90ba) for task 
> 52b5ebc8-26b0-4439-ac3c-a7bb4dc5330f of framework 
> 842f64ee-274e-4eb3-9787-ce8ee1000ffe-0000 from agent 
> 842f64ee-274e-4eb3-9787-ce8ee1000ffe-S0 at slave(1)@10.10.23.200:26211 
> (n10-023-200.byted.org)
> I0118 16:01:01.107515 185563 master.cpp:8553] Forwarding status update 
> TASK_FAILED (Status UUID: 23265b36-faf6-4709-9f67-b9fd817c90ba) for task 
> 52b5ebc8-26b0-4439-ac3c-a7bb4dc5330f of framework 
> 842f64ee-274e-4eb3-9787-ce8ee1000ffe-0000
> I0118 16:01:01.107573 185563 master.cpp:11190] Updating the state of task 
> 52b5ebc8-26b0-4439-ac3c-a7bb4dc5330f of framework 
> 842f64ee-274e-4eb3-9787-ce8ee1000ffe-0000 (latest state: TASK_FAILED, status 
> update state: TASK_FAILED)
> I0118 16:01:01.107702 185563 master.cpp:6319] Processing ACKNOWLEDGE call for 
> status 23265b36-faf6-4709-9f67-b9fd817c90ba for task 
> 52b5ebc8-26b0-4439-ac3c-a7bb4dc5330f of framework 
> 842f64ee-274e-4eb3-9787-ce8ee1000ffe-0000 (default) at 
> [email protected]:26211 on agent 
> 842f64ee-274e-4eb3-9787-ce8ee1000ffe-S0
> I0118 16:01:01.107728 185563 master.cpp:11288] Removing task 
> 52b5ebc8-26b0-4439-ac3c-a7bb4dc5330f with resources cpus(allocated: *):1; 
> mem(allocated: *):256; disk(allocated: *):1024 of framework 
> 842f64ee-274e-4eb3-9787-ce8ee1000ffe-0000 on agent 
> 842f64ee-274e-4eb3-9787-ce8ee1000ffe-S0 at slave(1)@10.10.23.200:26211 
> (n10-023-200.byted.org)
> I0118 16:01:01.107863 185556 task_status_update_manager.cpp:401] Received 
> task status update acknowledgement (UUID: 
> 23265b36-faf6-4709-9f67-b9fd817c90ba) for task 
> 52b5ebc8-26b0-4439-ac3c-a7bb4dc5330f of framework 
> 842f64ee-274e-4eb3-9787-ce8ee1000ffe-0000
> I0118 16:01:01.108021 185575 slave.cpp:6482] Cleaning up executor 
> '52b5ebc8-26b0-4439-ac3c-a7bb4dc5330f' of framework 
> 842f64ee-274e-4eb3-9787-ce8ee1000ffe-0000 at executor(1)@10.10.23.200:18282
> I0118 16:01:01.108121 185579 gc.cpp:95] Scheduling 
> '/tmp/MemoryPressureMesosTest_ROOT_CGROUPS_Statistics_i83ugR/slaves/842f64ee-274e-4eb3-9787-ce8ee1000ffe-S0/frameworks/842f64ee-274e-4eb3-9787-ce8ee1000ffe-0000/executors/52b5ebc8-26b0-4439-ac3c-a7bb4dc5330f/runs/8353191d-5d91-4780-b096-9d9aa28b0723'
>  for gc 6.99999874884444days in the future
> I0118 16:01:01.108178 185579 gc.cpp:95] Scheduling 
> '/tmp/MemoryPressureMesosTest_ROOT_CGROUPS_Statistics_i83ugR/slaves/842f64ee-274e-4eb3-9787-ce8ee1000ffe-S0/frameworks/842f64ee-274e-4eb3-9787-ce8ee1000ffe-0000/executors/52b5ebc8-26b0-4439-ac3c-a7bb4dc5330f'
>  for gc 6.99999874812444days in the future
> I0118 16:01:01.108189 185575 slave.cpp:6611] Cleaning up framework 
> 842f64ee-274e-4eb3-9787-ce8ee1000ffe-0000
> I0118 16:01:01.108223 185564 task_status_update_manager.cpp:289] Closing task 
> status update streams for framework 842f64ee-274e-4eb3-9787-ce8ee1000ffe-0000
> I0118 16:01:01.108242 185559 gc.cpp:95] Scheduling 
> '/tmp/MemoryPressureMesosTest_ROOT_CGROUPS_Statistics_i83ugR/slaves/842f64ee-274e-4eb3-9787-ce8ee1000ffe-S0/frameworks/842f64ee-274e-4eb3-9787-ce8ee1000ffe-0000'
>  for gc 6.99999874740741days in the future
> ../../src/tests/containerizer/memory_pressure_tests.cpp:156: Failure
> (usage).failure(): Unknown container 8353191d-5d91-4780-b096-9d9aa28b0723
> {code}
>  
>  
> It seemed that memory was consumed by cache.  I could not tell why. And the 
> test will pass if I change the offer's memory from 256 to 512MB. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to