[
https://issues.apache.org/jira/browse/MESOS-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14545012#comment-14545012
]
chenqiuhao edited comment on MESOS-2706 at 5/15/15 6:33 AM:
------------------------------------------------------------
Hi Tim,I update Mesos to 0.22.1 today.
I reproduced as well .
I noticed that when my task grew,the cpu usages of Process 'lt-mesos-slave'
increased and awlays was 100%(1CPU cost),like followed.
========Top reuslt========
top - 14:12:17 up 48 days, 16:41, 8 users, load average: 1.13, 1.36, 2.00
Tasks: 1717 total, 1 running, 1716 sleeping, 0 stopped, 0 zombie
%Cpu(s): 8.2 us, 8.2 sy, 0.0 ni, 83.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 32782296 total, 17602948 used, 15179348 free, 193112 buffers
KiB Swap: 20479996 total, 0 used, 20479996 free. 10514872 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
271790 root 20 0 952028 54868 14192 S 100.6 0.2 163:26.57
lt-mesos-slave
180453 root 20 0 12.791g 955736 26216 S 4.2 2.9 11:12.81 java
……
====================
And when I used the command top -Hp 271790 to find what thread(subProcess) of
lt-mesos-slave cost most,I found it was thread 271808 :
========top -Hp 271790 result========
top - 14:16:20 up 48 days, 16:45, 8 users, load average: 1.36, 1.35, 1.84
Threads: 12 total, 1 running, 11 sleeping, 0 stopped, 0 zombie
%Cpu(s): 8.2 us, 8.0 sy, 0.0 ni, 83.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 32782296 total, 17599804 used, 15182492 free, 193112 buffers
KiB Swap: 20479996 total, 0 used, 20479996 free. 10515348 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
271808 root 20 0 952028 54868 14192 R 99.9 0.2 165:04.18
lt-mesos-slave
271807 root 20 0 952028 54868 14192 S 0.7 0.2 0:21.79
lt-mesos-slave
271810 root 20 0 952028 54868 14192 S 0.3 0.2 0:16.86
lt-mesos-slave
271813 root 20 0 952028 54868 14192 S 0.3 0.2 0:19.85
lt-mesos-slave
271814 root 20 0 952028 54868 14192 S 0.3 0.2 0:23.27
lt-mesos-slave
271815 root 20 0 952028 54868 14192 S 0.3 0.2 0:06.96
lt-mesos-slave
271790 root 20 0 952028 54868 14192 S 0.0 0.2 0:00.05
lt-mesos-slave
271809 root 20 0 952028 54868 14192 S 0.0 0.2 0:17.30
lt-mesos-slave
271811 root 20 0 952028 54868 14192 S 0.0 0.2 0:22.63
lt-mesos-slave
271812 root 20 0 952028 54868 14192 S 0.0 0.2 0:16.02
lt-mesos-slave
271820 root 20 0 952028 54868 14192 S 0.0 0.2 0:00.53
lt-mesos-slave
271821 root 20 0 952028 54868 14192 S 0.0 0.2 0:00.00
lt-mesos-slave
=======================
And then I used the pstack command for serveral times to check out what thread
271808 was doing, I found :
=========1st time pstack 271808 ===============
Thread 1 (process 271808):
#0 0x00007fcb05f5725d in read () from /lib64/libpthread.so.0
#1 0x00007fcb05cbe567 in std::__basic_file<char>::xsgetn(char*, long) () from
/lib64/libstdc++.so.6
#2 0x00007fcb05cf7c50 in std::basic_filebuf<char, std::char_traits<char>
>::underflow() () from /lib64/libstdc++.so.6
#3 0x00007fcb05cc16e9 in std::istream::get(std::basic_streambuf<char,
std::char_traits<char> >&, char) () from /lib64/libstdc++.so.6
#4 0x00007fcb06837205 in proc::cmdline(Option<int> const&) () from
/wls/mesos-0.22.1/build/src/.libs/libmesos-0.22.1.so
#5 0x00007fcb06837ec8 in os::process(int) () from
/wls/mesos-0.22.1/build/src/.libs/libmesos-0.22.1.so
#6 0x00007fcb06b5fd1c in mesos::internal::usage(int, bool, bool) () from
/wls/mesos-0.22.1/build/src/.libs/libmesos-0.22.1.so
#7 0x00007fcb06a97eb0 in
mesos::internal::slave::DockerContainerizerProcess::__usage(mesos::ContainerID
const&, int) () from /wls/mesos-0.22.1/build/src/.libs/libmesos-0.22.1.so
#8 0x00007fcb06a9817d in
mesos::internal::slave::DockerContainerizerProcess::usage(mesos::ContainerID
const&) () from /wls/mesos-0.22.1/build/src/.libs/libmesos-0.22.1.so
#9 0x00007fcb06aac623 in std::_Function_handler<void (process::ProcessBase*),
process::Future<mesos::ResourceStatistics>
process::dispatch<mesos::ResourceStatistics,
mesos::internal::slave::DockerContainerizerProcess, mesos::ContainerID const&,
mesos::ContainerID>(process::PID<mesos::internal::slave::DockerContainerizerProcess>
const&, process::Future<mesos::ResourceStatistics>
(mesos::internal::slave::DockerContainerizerProcess::*)(mesos::ContainerID
const&),
mesos::ContainerID)::{lambda(process::ProcessBase*)#1}>::_M_invoke(std::_Any_data
const&, process::ProcessBase*) () from
/wls/mesos-0.22.1/build/src/.libs/libmesos-0.22.1.so
#10 0x00007fcb06dca62a in
process::ProcessManager::resume(process::ProcessBase*) () from
/wls/mesos-0.22.1/build/src/.libs/libmesos-0.22.1.so
#11 0x00007fcb06dca8ec in process::schedule(void*) () from
/wls/mesos-0.22.1/build/src/.libs/libmesos-0.22.1.so
#12 0x00007fcb05f50df3 in start_thread () from /lib64/libpthread.so.0
#13 0x00007fcb0545f3dd in clone () from /lib64/libc.so.6
=========2nd time pstack 271808 ==========
Thread 1 (process 271808):
#0 0x00007fcb05450a3d in open64 () from /lib64/libc.so.6
#1 0x00007fcb053e1160 in __GI__IO_file_fopen () from /lib64/libc.so.6
#2 0x00007fcb053d5914 in __fopen_internal () from /lib64/libc.so.6
#3 0x00000000004204cf in os::read(std::string const&) ()
#4 0x00007fcb06835cc7 in proc::status(int) () from
/wls/mesos-0.22.1/build/src/.libs/libmesos-0.22.1.so
#5 0x00007fcb06837b71 in os::process(int) () from
/wls/mesos-0.22.1/build/src/.libs/libmesos-0.22.1.so
#6 0x00007fcb06b5fd1c in mesos::internal::usage(int, bool, bool) () from
/wls/mesos-0.22.1/build/src/.libs/libmesos-0.22.1.so
#7 0x00007fcb06a97eb0 in
mesos::internal::slave::DockerContainerizerProcess::__usage(mesos::ContainerID
const&, int) () from /wls/mesos-0.22.1/build/src/.libs/libmesos-0.22.1.so
#8 0x00007fcb06a9817d in
mesos::internal::slave::DockerContainerizerProcess::usage(mesos::ContainerID
const&) () from /wls/mesos-0.22.1/build/src/.libs/libmesos-0.22.1.so
#9 0x00007fcb06aac623 in std::_Function_handler<void (process::ProcessBase*),
process::Future<mesos::ResourceStatistics>
process::dispatch<mesos::ResourceStatistics,
mesos::internal::slave::DockerContainerizerProcess, mesos::ContainerID const&,
mesos::ContainerID>(process::PID<mesos::internal::slave::DockerContainerizerProcess>
const&, process::Future<mesos::ResourceStatistics>
(mesos::internal::slave::DockerContainerizerProcess::*)(mesos::ContainerID
const&),
mesos::ContainerID)::{lambda(process::ProcessBase*)#1}>::_M_invoke(std::_Any_data
const&, process::ProcessBase*) () from
/wls/mesos-0.22.1/build/src/.libs/libmesos-0.22.1.so
#10 0x00007fcb06dca62a in
process::ProcessManager::resume(process::ProcessBase*) () from
/wls/mesos-0.22.1/build/src/.libs/libmesos-0.22.1.so
#11 0x00007fcb06dca8ec in process::schedule(void*) () from
/wls/mesos-0.22.1/build/src/.libs/libmesos-0.22.1.so
#12 0x00007fcb05f50df3 in start_thread () from /lib64/libpthread.so.0
#13 0x00007fcb0545f3dd in clone () from /lib64/libc.so.6
===========3rd time pstack 271808=========================
Thread 1 (process 271808):
#0 0x00007fcb05459d57 in munmap () from /lib64/libc.so.6
#1 0x00007fcb053e26c2 in __GI__IO_setb () from /lib64/libc.so.6
#2 0x00007fcb053e0e90 in __GI__IO_file_close_it () from /lib64/libc.so.6
#3 0x00007fcb053d4f68 in fclose@@GLIBC_2.2.5 () from /lib64/libc.so.6
#4 0x0000000000420560 in os::read(std::string const&) ()
#5 0x00007fcb06835cc7 in proc::status(int) () from
/wls/mesos-0.22.1/build/src/.libs/libmesos-0.22.1.so
#6 0x00007fcb06837b71 in os::process(int) () from
/wls/mesos-0.22.1/build/src/.libs/libmesos-0.22.1.so
#7 0x00007fcb06b5fd1c in mesos::internal::usage(int, bool, bool) () from
/wls/mesos-0.22.1/build/src/.libs/libmesos-0.22.1.so
#8 0x00007fcb06a97eb0 in
mesos::internal::slave::DockerContainerizerProcess::__usage(mesos::ContainerID
const&, int) () from /wls/mesos-0.22.1/build/src/.libs/libmesos-0.22.1.so
#9 0x00007fcb06a9817d in
mesos::internal::slave::DockerContainerizerProcess::usage(mesos::ContainerID
const&) () from /wls/mesos-0.22.1/build/src/.libs/libmesos-0.22.1.so
#10 0x00007fcb06aac623 in std::_Function_handler<void (process::ProcessBase*),
process::Future<mesos::ResourceStatistics>
process::dispatch<mesos::ResourceStatistics,
mesos::internal::slave::DockerContainerizerProcess, mesos::ContainerID const&,
mesos::ContainerID>(process::PID<mesos::internal::slave::DockerContainerizerProcess>
const&, process::Future<mesos::ResourceStatistics>
(mesos::internal::slave::DockerContainerizerProcess::*)(mesos::ContainerID
const&),
mesos::ContainerID)::{lambda(process::ProcessBase*)#1}>::_M_invoke(std::_Any_data
const&, process::ProcessBase*) () from
/wls/mesos-0.22.1/build/src/.libs/libmesos-0.22.1.so
#11 0x00007fcb06dca62a in
process::ProcessManager::resume(process::ProcessBase*) () from
/wls/mesos-0.22.1/build/src/.libs/libmesos-0.22.1.so
#12 0x00007fcb06dca8ec in process::schedule(void*) () from
/wls/mesos-0.22.1/build/src/.libs/libmesos-0.22.1.so
#13 0x00007fcb05f50df3 in start_thread () from /lib64/libpthread.so.0
#14 0x00007fcb0545f3dd in clone () from /lib64/libc.so.6
=========================
The resulst told us that Thread 271808 was handling the usage things.
#7 0x00007fcb06b5fd1c in mesos::internal::usage(int, bool, bool) () from
/wls/mesos-0.22.1/build/src/.libs/libmesos-0.22.1.so
#8 0x00007fcb06a97eb0 in
mesos::internal::slave::DockerContainerizerProcess::__usage(mesos::ContainerID
const&, int) () from /wls/mesos-0.22.1/build/src/.libs/libmesos-0.22.1.so
#9 0x00007fcb06a9817d in
mesos::internal::slave::DockerContainerizerProcess::usage(mesos::ContainerID
const&) () from /wls/mesos-0.22.1/build/src/.libs/libmesos-0.22.1.so
#10 0x00007fcb06aac623 in std::_Function_handler<void (process::ProcessBase*),
process::Future<mesos::ResourceStatistics>
process::dispatch<mesos::ResourceStatistics,
mesos::internal::slave::DockerContainerizerProcess, mesos::ContainerID const&,
mesos::ContainerID>(process::PID<mesos::internal::slave::DockerContainerizerProcess>
const&, process::Future<mesos::ResourceStatistics>
(mesos::internal::slave::DockerContainerizerProcess::*)(mesos::ContainerID
const&),
mesos::ContainerID)::{lambda(process::ProcessBase*)#1}>::_M_invoke(std::_Any_data
const&, process::ProcessBase*) () from
/wls/mesos-0.22.1/build/src/.libs/libmesos-0.22.1.so
And I suppose lt-mesos-slave is a single process with multi-threads.The
bottle-neck of mesos-slave is that the process only can cost 1CPU,and its
usage-thread is too busy ,maybe some bugs of resource race .
was (Author: kairu1987):
Hi Tim,I update Mesos to 0.22.1 today.
I reproduced as well .
I noticed that when my task grew,the cpu usages of Process 'lt-mesos-slave'
incresed and awlays was 100%(1CPU cost),like followed.
========Top reuslt========
top - 14:12:17 up 48 days, 16:41, 8 users, load average: 1.13, 1.36, 2.00
Tasks: 1717 total, 1 running, 1716 sleeping, 0 stopped, 0 zombie
%Cpu(s): 8.2 us, 8.2 sy, 0.0 ni, 83.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 32782296 total, 17602948 used, 15179348 free, 193112 buffers
KiB Swap: 20479996 total, 0 used, 20479996 free. 10514872 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
271790 root 20 0 952028 54868 14192 S 100.6 0.2 163:26.57
lt-mesos-slave
180453 root 20 0 12.791g 955736 26216 S 4.2 2.9 11:12.81 java
……
====================
And when I used the command top -Hp 271790 to find what thread of
lt-mesos-slave cost most,I found it was thread 271808 :
========top -Hp 271790 result========
top - 14:16:20 up 48 days, 16:45, 8 users, load average: 1.36, 1.35, 1.84
Threads: 12 total, 1 running, 11 sleeping, 0 stopped, 0 zombie
%Cpu(s): 8.2 us, 8.0 sy, 0.0 ni, 83.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 32782296 total, 17599804 used, 15182492 free, 193112 buffers
KiB Swap: 20479996 total, 0 used, 20479996 free. 10515348 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
271808 root 20 0 952028 54868 14192 R 99.9 0.2 165:04.18
lt-mesos-slave
271807 root 20 0 952028 54868 14192 S 0.7 0.2 0:21.79
lt-mesos-slave
271810 root 20 0 952028 54868 14192 S 0.3 0.2 0:16.86
lt-mesos-slave
271813 root 20 0 952028 54868 14192 S 0.3 0.2 0:19.85
lt-mesos-slave
271814 root 20 0 952028 54868 14192 S 0.3 0.2 0:23.27
lt-mesos-slave
271815 root 20 0 952028 54868 14192 S 0.3 0.2 0:06.96
lt-mesos-slave
271790 root 20 0 952028 54868 14192 S 0.0 0.2 0:00.05
lt-mesos-slave
271809 root 20 0 952028 54868 14192 S 0.0 0.2 0:17.30
lt-mesos-slave
271811 root 20 0 952028 54868 14192 S 0.0 0.2 0:22.63
lt-mesos-slave
271812 root 20 0 952028 54868 14192 S 0.0 0.2 0:16.02
lt-mesos-slave
271820 root 20 0 952028 54868 14192 S 0.0 0.2 0:00.53
lt-mesos-slave
271821 root 20 0 952028 54868 14192 S 0.0 0.2 0:00.00
lt-mesos-slave
=======================
And then I used the pstack command for serveral times to check out what thread
271808 did, I found :
=========1st time pstack 271808 ===============
Thread 1 (process 271808):
#0 0x00007fcb05f5725d in read () from /lib64/libpthread.so.0
#1 0x00007fcb05cbe567 in std::__basic_file<char>::xsgetn(char*, long) () from
/lib64/libstdc++.so.6
#2 0x00007fcb05cf7c50 in std::basic_filebuf<char, std::char_traits<char>
>::underflow() () from /lib64/libstdc++.so.6
#3 0x00007fcb05cc16e9 in std::istream::get(std::basic_streambuf<char,
std::char_traits<char> >&, char) () from /lib64/libstdc++.so.6
#4 0x00007fcb06837205 in proc::cmdline(Option<int> const&) () from
/wls/mesos-0.22.1/build/src/.libs/libmesos-0.22.1.so
#5 0x00007fcb06837ec8 in os::process(int) () from
/wls/mesos-0.22.1/build/src/.libs/libmesos-0.22.1.so
#6 0x00007fcb06b5fd1c in mesos::internal::usage(int, bool, bool) () from
/wls/mesos-0.22.1/build/src/.libs/libmesos-0.22.1.so
#7 0x00007fcb06a97eb0 in
mesos::internal::slave::DockerContainerizerProcess::__usage(mesos::ContainerID
const&, int) () from /wls/mesos-0.22.1/build/src/.libs/libmesos-0.22.1.so
#8 0x00007fcb06a9817d in
mesos::internal::slave::DockerContainerizerProcess::usage(mesos::ContainerID
const&) () from /wls/mesos-0.22.1/build/src/.libs/libmesos-0.22.1.so
#9 0x00007fcb06aac623 in std::_Function_handler<void (process::ProcessBase*),
process::Future<mesos::ResourceStatistics>
process::dispatch<mesos::ResourceStatistics,
mesos::internal::slave::DockerContainerizerProcess, mesos::ContainerID const&,
mesos::ContainerID>(process::PID<mesos::internal::slave::DockerContainerizerProcess>
const&, process::Future<mesos::ResourceStatistics>
(mesos::internal::slave::DockerContainerizerProcess::*)(mesos::ContainerID
const&),
mesos::ContainerID)::{lambda(process::ProcessBase*)#1}>::_M_invoke(std::_Any_data
const&, process::ProcessBase*) () from
/wls/mesos-0.22.1/build/src/.libs/libmesos-0.22.1.so
#10 0x00007fcb06dca62a in
process::ProcessManager::resume(process::ProcessBase*) () from
/wls/mesos-0.22.1/build/src/.libs/libmesos-0.22.1.so
#11 0x00007fcb06dca8ec in process::schedule(void*) () from
/wls/mesos-0.22.1/build/src/.libs/libmesos-0.22.1.so
#12 0x00007fcb05f50df3 in start_thread () from /lib64/libpthread.so.0
#13 0x00007fcb0545f3dd in clone () from /lib64/libc.so.6
=========2nd time pstack 271808 ==========
Thread 1 (process 271808):
#0 0x00007fcb05450a3d in open64 () from /lib64/libc.so.6
#1 0x00007fcb053e1160 in __GI__IO_file_fopen () from /lib64/libc.so.6
#2 0x00007fcb053d5914 in __fopen_internal () from /lib64/libc.so.6
#3 0x00000000004204cf in os::read(std::string const&) ()
#4 0x00007fcb06835cc7 in proc::status(int) () from
/wls/mesos-0.22.1/build/src/.libs/libmesos-0.22.1.so
#5 0x00007fcb06837b71 in os::process(int) () from
/wls/mesos-0.22.1/build/src/.libs/libmesos-0.22.1.so
#6 0x00007fcb06b5fd1c in mesos::internal::usage(int, bool, bool) () from
/wls/mesos-0.22.1/build/src/.libs/libmesos-0.22.1.so
#7 0x00007fcb06a97eb0 in
mesos::internal::slave::DockerContainerizerProcess::__usage(mesos::ContainerID
const&, int) () from /wls/mesos-0.22.1/build/src/.libs/libmesos-0.22.1.so
#8 0x00007fcb06a9817d in
mesos::internal::slave::DockerContainerizerProcess::usage(mesos::ContainerID
const&) () from /wls/mesos-0.22.1/build/src/.libs/libmesos-0.22.1.so
#9 0x00007fcb06aac623 in std::_Function_handler<void (process::ProcessBase*),
process::Future<mesos::ResourceStatistics>
process::dispatch<mesos::ResourceStatistics,
mesos::internal::slave::DockerContainerizerProcess, mesos::ContainerID const&,
mesos::ContainerID>(process::PID<mesos::internal::slave::DockerContainerizerProcess>
const&, process::Future<mesos::ResourceStatistics>
(mesos::internal::slave::DockerContainerizerProcess::*)(mesos::ContainerID
const&),
mesos::ContainerID)::{lambda(process::ProcessBase*)#1}>::_M_invoke(std::_Any_data
const&, process::ProcessBase*) () from
/wls/mesos-0.22.1/build/src/.libs/libmesos-0.22.1.so
#10 0x00007fcb06dca62a in
process::ProcessManager::resume(process::ProcessBase*) () from
/wls/mesos-0.22.1/build/src/.libs/libmesos-0.22.1.so
#11 0x00007fcb06dca8ec in process::schedule(void*) () from
/wls/mesos-0.22.1/build/src/.libs/libmesos-0.22.1.so
#12 0x00007fcb05f50df3 in start_thread () from /lib64/libpthread.so.0
#13 0x00007fcb0545f3dd in clone () from /lib64/libc.so.6
===========3rd time pstack 271808=========================
Thread 1 (process 271808):
#0 0x00007fcb05459d57 in munmap () from /lib64/libc.so.6
#1 0x00007fcb053e26c2 in __GI__IO_setb () from /lib64/libc.so.6
#2 0x00007fcb053e0e90 in __GI__IO_file_close_it () from /lib64/libc.so.6
#3 0x00007fcb053d4f68 in fclose@@GLIBC_2.2.5 () from /lib64/libc.so.6
#4 0x0000000000420560 in os::read(std::string const&) ()
#5 0x00007fcb06835cc7 in proc::status(int) () from
/wls/mesos-0.22.1/build/src/.libs/libmesos-0.22.1.so
#6 0x00007fcb06837b71 in os::process(int) () from
/wls/mesos-0.22.1/build/src/.libs/libmesos-0.22.1.so
#7 0x00007fcb06b5fd1c in mesos::internal::usage(int, bool, bool) () from
/wls/mesos-0.22.1/build/src/.libs/libmesos-0.22.1.so
#8 0x00007fcb06a97eb0 in
mesos::internal::slave::DockerContainerizerProcess::__usage(mesos::ContainerID
const&, int) () from /wls/mesos-0.22.1/build/src/.libs/libmesos-0.22.1.so
#9 0x00007fcb06a9817d in
mesos::internal::slave::DockerContainerizerProcess::usage(mesos::ContainerID
const&) () from /wls/mesos-0.22.1/build/src/.libs/libmesos-0.22.1.so
#10 0x00007fcb06aac623 in std::_Function_handler<void (process::ProcessBase*),
process::Future<mesos::ResourceStatistics>
process::dispatch<mesos::ResourceStatistics,
mesos::internal::slave::DockerContainerizerProcess, mesos::ContainerID const&,
mesos::ContainerID>(process::PID<mesos::internal::slave::DockerContainerizerProcess>
const&, process::Future<mesos::ResourceStatistics>
(mesos::internal::slave::DockerContainerizerProcess::*)(mesos::ContainerID
const&),
mesos::ContainerID)::{lambda(process::ProcessBase*)#1}>::_M_invoke(std::_Any_data
const&, process::ProcessBase*) () from
/wls/mesos-0.22.1/build/src/.libs/libmesos-0.22.1.so
#11 0x00007fcb06dca62a in
process::ProcessManager::resume(process::ProcessBase*) () from
/wls/mesos-0.22.1/build/src/.libs/libmesos-0.22.1.so
#12 0x00007fcb06dca8ec in process::schedule(void*) () from
/wls/mesos-0.22.1/build/src/.libs/libmesos-0.22.1.so
#13 0x00007fcb05f50df3 in start_thread () from /lib64/libpthread.so.0
#14 0x00007fcb0545f3dd in clone () from /lib64/libc.so.6
=========================
The resulst told us that Thread 271808 was handling the usage things.
#7 0x00007fcb06b5fd1c in mesos::internal::usage(int, bool, bool) () from
/wls/mesos-0.22.1/build/src/.libs/libmesos-0.22.1.so
#8 0x00007fcb06a97eb0 in
mesos::internal::slave::DockerContainerizerProcess::__usage(mesos::ContainerID
const&, int) () from /wls/mesos-0.22.1/build/src/.libs/libmesos-0.22.1.so
#9 0x00007fcb06a9817d in
mesos::internal::slave::DockerContainerizerProcess::usage(mesos::ContainerID
const&) () from /wls/mesos-0.22.1/build/src/.libs/libmesos-0.22.1.so
#10 0x00007fcb06aac623 in std::_Function_handler<void (process::ProcessBase*),
process::Future<mesos::ResourceStatistics>
process::dispatch<mesos::ResourceStatistics,
mesos::internal::slave::DockerContainerizerProcess, mesos::ContainerID const&,
mesos::ContainerID>(process::PID<mesos::internal::slave::DockerContainerizerProcess>
const&, process::Future<mesos::ResourceStatistics>
(mesos::internal::slave::DockerContainerizerProcess::*)(mesos::ContainerID
const&),
mesos::ContainerID)::{lambda(process::ProcessBase*)#1}>::_M_invoke(std::_Any_data
const&, process::ProcessBase*) () from
/wls/mesos-0.22.1/build/src/.libs/libmesos-0.22.1.so
And I suppose lt-mesos-slave is a single process with multi-threads.The
bottle-neck of mesos-slave is that the process only can cost 1CPU,and its
usage-thread is too busy ,maybe some bugs of resource race .
> When the docker-tasks grow, the time spare between Queuing task and Starting
> container grows
> --------------------------------------------------------------------------------------------
>
> Key: MESOS-2706
> URL: https://issues.apache.org/jira/browse/MESOS-2706
> Project: Mesos
> Issue Type: Bug
> Components: docker
> Affects Versions: 0.22.0
> Environment: My Environment info:
> Mesos 0.22.0 & Marathon 0.82-RC1 both running in one host-server.
> Every docker-task require 0.02 CPU and 128MB ,and the server has 8 cpus and
> 24G mems.
> So Mesos can launch thousands of task in theory.
> And the docker-task is very light-weight to launch a sshd service .
> Reporter: chenqiuhao
>
> At the beginning, Marathon can launch docker-task very fast,but when the
> number of tasks in the only-one mesos-slave host reached 50,It seemed
> Marathon lauch docker-task slow.
> So I check the mesos-slave log,and I found that the time spare between
> Queuing task and Starting container grew .
> For example,
> launch the 1st docker task, it takes about 0.008s
> [root@CNSH231434 mesos-slave]# tail -f slave.out |egrep 'Queuing
> task|Starting container'
> I0508 15:54:00.188350 225779 slave.cpp:1378] Queuing task
> 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b' for executor
> dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b of framework
> '20150202-112355-2684495626-5050-26153-0000
> I0508 15:54:00.196832 225781 docker.cpp:581] Starting container
> 'd0b0813a-6cb6-4dfd-bbce-f1b338744285' for task
> 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b' (and executor
> 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b') of framework
> '20150202-112355-2684495626-5050-26153-0000'
> launch the 50th docker task, it takes about 4.9s
> I0508 16:12:10.908596 225781 slave.cpp:1378] Queuing task
> 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b' for executor
> dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b of framework
> '20150202-112355-2684495626-5050-26153-0000
> I0508 16:12:15.801503 225778 docker.cpp:581] Starting container
> '482dd47f-b9ab-4b09-b89e-e361d6f004a4' for task
> 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b' (and executor
> 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b') of framework
> '20150202-112355-2684495626-5050-26153-0000'
> And when i launch the 100th docker task,it takes about 13s!
> And I did the same test in one 24 Cpus and 256G mems server-host, it got the
> same result.
> Did somebody have the same experience , or Can help to do the same pressure
> test ?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)