-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7887/#review13203
-----------------------------------------------------------

Ship it!



src/slave/cgroups_isolation_module.hpp
<https://reviews.apache.org/r/7887/#comment28405>

    It's a bit odd to have:
    
    killed // whether killExecutor() called
    destroyed // whether destroyed by module
    
    Maybe rename to something more indicative?
    bool killAttempted; // Have we tried to kill it via killExecutor()?



src/slave/cgroups_isolation_module.cpp
<https://reviews.apache.org/r/7887/#comment28407>

    This message comes out a bit rough in the log:
    
    I1106 01:53:54.852854 61941 cgroups_isolation_module.cpp:689] MEMORY LIMIT: 
100663296 bytes
    MEMORY USAGE: 100663296 bytes
    MEMORY STATISTICS: 
    cache 245760
    rss 100417536
    mapped_file 24576
    pgpgin 7320
    pgpgout 6250
    inactive_anon 0
    active_anon 1826816
    inactive_file 192512
    active_file 53248
    unevictable 98590720
    hierarchical_memory_limit 100663296
    total_cache 245760
    total_rss 100417536
    total_mapped_file 24576
    total_pgpgin 7320
    total_pgpgout 6250
    total_inactive_anon 0
    total_active_anon 1826816
    total_inactive_file 192512
    total_active_file 53248
    total_unevictable 98590720
    
    vs having the oom + data in 1 log message + indentation
    
    I1106 01:53:54.852854 61941 cgroups_isolation_module.cpp:689] OOM detected 
for executor default of framework 201211060153-2081170186-5432-61885-0000 with 
tag bf7fc2e7-a9c4-4240-8300-18acb99490dc
      MEMORY LIMIT: 100663296 bytes
      MEMORY USAGE: 100663296 bytes
      MEMORY STATISTICS: 
        cache 245760
        rss 100417536
        mapped_file 24576
        pgpgin 7320
        pgpgout 6250
        inactive_anon 0
        active_anon 1826816
        inactive_file 192512
        active_file 53248
        unevictable 98590720
        hierarchical_memory_limit 100663296
        total_cache 245760
        total_rss 100417536
        total_mapped_file 24576
        total_pgpgin 7320
        total_pgpgout 6250
        total_inactive_anon 0
        total_active_anon 1826816
        total_inactive_file 192512
        total_active_file 53248
        total_unevictable 98590720
    
    Also, for the reason, can you prepend the fact that an OOM happened?
    
    like:
    I1106 01:54:00.542150 61984 sched.cpp:326] Status update: task 1 of 
framework 201211060153-2081170186-5432-61885-0000 is now in state TASK_FAILED
    Task in state TASK_FAILED
    Reason: OOM Detected // <-- Here
    MEMORY LIMIT: 100663296 bytes
    MEMORY USAGE: 100663296 bytes
    MEMORY STATISTICS: 
    



src/slave/slave.cpp
<https://reviews.apache.org/r/7887/#comment28406>

    Just curious, why the check for command executor?
    
    More specifically, why is a terminated non-destroyed command executor 
failed instead of lost?


- Ben Mahler


On Nov. 6, 2012, 8:33 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7887/
> -----------------------------------------------------------
> 
> (Updated Nov. 6, 2012, 8:33 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> See summary
> 
> 
> Diffs
> -----
> 
>   src/common/protobuf_utils.hpp 77b300d7c1a02a836100d3365e205889c48ae99a 
>   src/examples/balloon_framework.cpp e9b60de0c7d3a96381aff37340e0f5ac499850dd 
>   src/slave/cgroups_isolation_module.hpp 
> dd4703a1ca584d2347efac95bcdfae9a84544d4a 
>   src/slave/cgroups_isolation_module.cpp 
> 3d10ee568b8f194543707374f34f21bd3a927958 
>   src/slave/lxc_isolation_module.cpp 36d86e08f7b511371a9a2053ddf43477063a79f1 
>   src/slave/process_based_isolation_module.cpp 
> b0b6a81c93acc68d1f4acbdda5ab2f9f96b5fb5a 
>   src/slave/slave.hpp be0d7cc239e51636bb07e12c3046e0751a958787 
>   src/slave/slave.cpp 2bd2dbce538a6108dd9fe607829cfbdab33e0778 
>   src/tests/fault_tolerance_tests.cpp 
> a01d1aef012b636f2ced64d4d2ffabfb6ce42644 
>   src/tests/gc_tests.cpp b61b2de621e227f327ce546b62f8dfc528f3894e 
>   src/tests/master_tests.cpp d9cd09c5650234351f570f0a035f4b61cd2d00f5 
> 
> Diff: https://reviews.apache.org/r/7887/diff/
> 
> 
> Testing
> -------
> 
> make check (CentOs)
> 
> [vinod@smfd-aki-27-sr1:~/mesos/build] $ sudo GLOG_v=1 ./bin/mesos-tests.sh  
> --gtest_filter="*CgroupsIsolationTest*" --verbose
> ...
> ...
> I1106 01:53:54.852120 61941 cgroups_isolation_module.cpp:617] OOM notifier is 
> triggered for executor default of framework 
> 201211060153-2081170186-5432-61885-0000 with tag 
> bf7fc2e7-a9c4-4240-8300-18acb99490dc
> I1106 01:53:54.852165 61941 cgroups_isolation_module.cpp:662] OOM detected 
> for executor default of framework 201211060153-2081170186-5432-61885-0000 
> with tag bf7fc2e7-a9c4-4240-8300-18acb99490dc
> I1106 01:53:54.852854 61941 cgroups_isolation_module.cpp:689] MEMORY LIMIT: 
> 100663296 bytes
> MEMORY USAGE: 100663296 bytes
> MEMORY STATISTICS: 
> cache 245760
> rss 100417536
> mapped_file 24576
> pgpgin 7320
> pgpgout 6250
> inactive_anon 0
> active_anon 1826816
> inactive_file 192512
> active_file 53248
> unevictable 98590720
> hierarchical_memory_limit 100663296
> total_cache 245760
> total_rss 100417536
> total_mapped_file 24576
> total_pgpgin 7320
> total_pgpgout 6250
> total_inactive_anon 0
> total_active_anon 1826816
> total_inactive_file 192512
> total_active_file 53248
> total_unevictable 98590720
> I1106 01:53:54.852898 61941 cgroups_isolation_module.cpp:408] Killing 
> executor default of framework 201211060153-2081170186-5432-61885-0000
> I1106 01:53:54.855185 61937 cgroups.cpp:1116] Attempting to freeze cgroup 
> 'mesos/framework_201211060153-2081170186-5432-61885-0000_executor_default_tag_bf7fc2e7-a9c4-4240-8300-18acb99490dc'
> I1106 01:53:55.536480 61907 hierarchical_allocator_process.hpp:608] No 
> resources available to allocate!
> I1106 01:53:55.536576 61907 hierarchical_allocator_process.hpp:543] Performed 
> allocation for 1 slaves in 130.08us
> I1106 01:53:56.537866 61903 hierarchical_allocator_process.hpp:608] No 
> resources available to allocate!
> I1106 01:53:56.537951 61903 hierarchical_allocator_process.hpp:543] Performed 
> allocation for 1 slaves in 103.18us
> I1106 01:53:57.538408 61912 hierarchical_allocator_process.hpp:608] No 
> resources available to allocate!
> I1106 01:53:57.538483 61912 hierarchical_allocator_process.hpp:543] Performed 
> allocation for 1 slaves in 93.44us
> I1106 01:53:58.539499 61908 hierarchical_allocator_process.hpp:608] No 
> resources available to allocate!
> I1106 01:53:58.539593 61908 hierarchical_allocator_process.hpp:543] Performed 
> allocation for 1 slaves in 113.75us
> W1106 01:53:59.532685 61903 master.cpp:79] No whitelist given. Advertising 
> offers for all slaves
> I1106 01:53:59.540832 61912 hierarchical_allocator_process.hpp:608] No 
> resources available to allocate!
> I1106 01:53:59.540907 61912 hierarchical_allocator_process.hpp:543] Performed 
> allocation for 1 slaves in 91.56us
> W1106 01:54:00.020642 61941 cgroups.cpp:1201] Unable to freeze cgroup 
> 'mesos/framework_201211060153-2081170186-5432-61885-0000_executor_default_tag_bf7fc2e7-a9c4-4240-8300-18acb99490dc'
>  within 51 attempts
> I1106 01:54:00.022102 61937 cgroups.cpp:1131] Attempting to thaw cgroup 
> 'mesos/framework_201211060153-2081170186-5432-61885-0000_executor_default_tag_bf7fc2e7-a9c4-4240-8300-18acb99490dc'
> I1106 01:54:00.022274 61937 cgroups.cpp:1237] Successfully thawed cgroup 
> 'mesos/framework_201211060153-2081170186-5432-61885-0000_executor_default_tag_bf7fc2e7-a9c4-4240-8300-18acb99490dc'
> I1106 01:54:00.030532 61948 process.cpp:872] Socket closed while receiving
> I1106 01:54:00.129642 61936 cgroups_isolation_module.cpp:705] Successfully 
> destroyed the cgroup 
> mesos/framework_201211060153-2081170186-5432-61885-0000_executor_default_tag_bf7fc2e7-a9c4-4240-8300-18acb99490dc
> I1106 01:54:00.539801 61944 cgroups_isolation_module.cpp:468] Telling slave 
> of terminated executor default of framework 
> 201211060153-2081170186-5432-61885-0000
> I1106 01:54:00.539939 61934 slave.cpp:1008] Executor 'default' of framework 
> 201211060153-2081170186-5432-61885-0000 has terminated with signal Killed
> I1106 01:54:00.541018 61934 slave.cpp:833] Status update: task 1 of framework 
> 201211060153-2081170186-5432-61885-0000 is now in state TASK_FAILED
> I1106 01:54:00.541290 61944 cgroups_isolation_module.cpp:441] Asked to update 
> resources for an unknown/terminated executor
> I1106 01:54:00.541384 61904 hierarchical_allocator_process.hpp:608] No 
> resources available to allocate!
> I1106 01:54:00.541460 61904 hierarchical_allocator_process.hpp:543] Performed 
> allocation for 1 slaves in 87.63us
> I1106 01:54:00.541471 61936 gc.cpp:97] Scheduling 
> /tmp/mesos/slaves/201211060153-2081170186-5432-61885-0/frameworks/201211060153-2081170186-5432-61885-0000/executors/default/runs/c842b51d-d962-4b20-a80a-bfe484f6dc95
>  for removal
> I1106 01:54:00.541610 61907 master.cpp:1024] Status update from 
> slave(1)@10.35.12.124:36146: task 1 of framework 
> 201211060153-2081170186-5432-61885-0000 is now in state TASK_FAILED
> I1106 01:54:00.541759 61907 master.hpp:288] Removing task with resources 
> mem=32 on slave 201211060153-2081170186-5432-61885-0
> I1106 01:54:00.541872 61907 master.cpp:1125] Executor default of framework 
> 201211060153-2081170186-5432-61885-0000 on slave 
> 201211060153-2081170186-5432-61885-0 (smfd-aki-27-sr1.devel.twitter.com) 
> exited with status 9
> I1106 01:54:00.541872 61912 hierarchical_allocator_process.hpp:491] Recovered 
> mem=32 on slave 201211060153-2081170186-5432-61885-0 from framework 
> 201211060153-2081170186-5432-61885-0000
> I1106 01:54:00.541967 61912 hierarchical_allocator_process.hpp:491] Recovered 
> mem=64 on slave 201211060153-2081170186-5432-61885-0 from framework 
> 201211060153-2081170186-5432-61885-0000
> I1106 01:54:00.542150 61984 sched.cpp:326] Status update: task 1 of framework 
> 201211060153-2081170186-5432-61885-0000 is now in state TASK_FAILED
> Task in state TASK_FAILED
> Reason: MEMORY LIMIT: 100663296 bytes
> MEMORY USAGE: 100663296 bytes
> MEMORY STATISTICS: 
> cache 245760
> rss 100417536
> mapped_file 24576
> pgpgin 7320
> pgpgout 6250
> inactive_anon 0
> active_anon 1826816
> inactive_file 192512
> active_file 53248
> unevictable 98590720
> hierarchical_memory_limit 100663296
> total_cache 245760
> total_rss 100417536
> total_mapped_file 24576
> total_pgpgin 7320
> total_pgpgout 6250
> total_inactive_anon 0
> total_active_anon 1826816
> total_inactive_file 192512
> total_active_file 53248
> total_unevictable 98590720
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>

Reply via email to