[
https://issues.apache.org/jira/browse/TS-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yunkai Zhang updated TS-2089:
-----------------------------
Description:
we should make log collation more thread & cpu nice.
{code}
top - 22:34:22 up 3:37, 1 user, load average: 1.20, 1.71, 1.66
Tasks: 979 total, 2 running, 977 sleeping, 0 stopped, 0 zombie
Cpu(s): 3.5%us, 0.6%sy, 0.0%ni, 95.6%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Mem: 65943744k total, 62536348k used, 3407396k free, 90544k buffers
Swap: 2097144k total, 0k used, 2097144k free, 54189788k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4651 ats 20 0 12.9g 5.3g 4928 R 100.1 8.5 197:51.78 [LOGGING]
4460 ats 20 0 12.9g 5.3g 4928 S 7.9 8.5 18:12.57 [ET_NET 44]
4457 ats 20 0 12.9g 5.3g 4928 S 5.3 8.5 7:32.34 [ET_NET 41]
4461 ats 20 0 12.9g 5.3g 4928 S 4.3 8.5 6:41.79 [ET_NET 45]
4453 ats 20 0 12.9g 5.3g 4928 S 3.0 8.5 6:44.52 [ET_NET 37]
4463 ats 20 0 12.9g 5.3g 4928 S 3.0 8.5 7:54.01 [ET_NET 47]
3652 root 39 19 0 0 0 S 1.0 0.0 2:14.15 kipmi0
4425 ats 20 0 12.9g 5.3g 4928 S 0.7 8.5 0:19.21 [ET_NET 9]
4426 ats 20 0 12.9g 5.3g 4928 S 0.7 8.5 0:19.42 [ET_NET 10]
4430 ats 20 0 12.9g 5.3g 4928 S 0.7 8.5 0:19.55 [ET_NET 14]
132 root 20 0 0 0 0 S 0.3 0.0 0:00.82 events/1
4417 ats 20 0 12.9g 5.3g 4928 S 0.3 8.5 0:19.82 [ET_NET 1]
4418 ats 20 0 12.9g 5.3g 4928 S 0.3 8.5 0:20.02 [ET_NET 2]
4420 ats 20 0 12.9g 5.3g 4928 S 0.3 8.5 0:19.64 [ET_NET 4]
4422 ats 20 0 12.9g 5.3g 4928 S 0.3 8.5 0:19.84 [ET_NET 6]
4427 ats 20 0 12.9g 5.3g 4928 S 0.3 8.5 0:19.94 [ET_NET 11]
4435 ats 20 0 12.9g 5.3g 4928 S 0.3 8.5 0:19.88 [ET_NET 19]
{code}
== Update ===
I have developed a patch to fix this issue. After I rebase it to master branch,
I'll attach it to here. Here is the description of my patch:
---------------
We found that CPU of logging thread could be easy to reach up 100% in
collation host, but disk IO was low at the same time.
The bottleneck of logging thread is that some preprocessing job, such as
convert LogBuffer to ascii text, consume so much CPU time. And more
worse, the write() operation will block logging thread.
So this patch try to split logging thread into two parts:
1) Configurable preproc threads, which are responsiable for processing all
of prepare work, and then forward the preprocessed buffer to flush thread,
or send them to CollationClient/HostSM.
2) One Flush thread, it will consume preprocessed buffers and write them to
disk. In our testing, one flush thread is enough for us.
TODO: This patch supports only one flush thread, we can improve it to
"one flush thread per file/disk" in the future.
== How to configure ==
The number of preproc threads is 1 by default.
Please modify "proxy.config.log.collation_preproc_threads" option to
change it.
---------------
======
After apply this patch, let's have a look the 'top' output:
{code}
[[email protected] core]# top -H -p 3984 -b -n 1
top - 15:39:06 up 17:23, 6 users, load average: 3.37, 3.09, 3.15
Tasks: 246 total, 6 running, 240 sleeping, 0 stopped, 0 zombie
Cpu(s): 1.4%us, 0.5%sy, 0.0%ni, 97.6%id, 0.4%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 65943744k total, 65763084k used, 180660k free, 136688k buffers
Swap: 2097144k total, 0k used, 2097144k free, 57275428k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4237 ats 20 0 8941m 5.2g 5148 R 39.5 8.3 80:46.23 [LOG_PREPROC 2]
4236 ats 20 0 8941m 5.2g 5148 R 37.5 8.3 80:46.53 [LOG_PREPROC 1]
4235 ats 20 0 8941m 5.2g 5148 R 35.6 8.3 80:46.71 [LOG_PREPROC 0]
4238 ats 20 0 8941m 5.2g 5148 R 35.6 8.3 80:48.37 [LOG_PREPROC 3]
4239 ats 20 0 8941m 5.2g 5148 R 35.6 8.3 80:46.58 [LOG_PREPROC 4]
4240 ats 20 0 8941m 5.2g 5148 R 11.9 8.3 20:45.28 [LOG_FLUSH]
4037 ats 20 0 8941m 5.2g 5148 S 7.9 8.3 15:06.60 [ET_NET 38]
4045 ats 20 0 8941m 5.2g 5148 S 5.9 8.3 9:55.74 [ET_NET 46]
4038 ats 20 0 8941m 5.2g 5148 S 4.0 8.3 6:08.97 [ET_NET 39]
4042 ats 20 0 8941m 5.2g 5148 S 4.0 8.3 8:36.73 [ET_NET 43]
...
{code}
[LOG_PREPROC 0-4] are collation preproc threads, [LOG_FLUSH] is new flush
thread. They have good throughput now.
was:
we should make log collation more thread & cpu nice.
{code}
top - 22:34:22 up 3:37, 1 user, load average: 1.20, 1.71, 1.66
Tasks: 979 total, 2 running, 977 sleeping, 0 stopped, 0 zombie
Cpu(s): 3.5%us, 0.6%sy, 0.0%ni, 95.6%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Mem: 65943744k total, 62536348k used, 3407396k free, 90544k buffers
Swap: 2097144k total, 0k used, 2097144k free, 54189788k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4651 ats 20 0 12.9g 5.3g 4928 R 100.1 8.5 197:51.78 [LOGGING]
4460 ats 20 0 12.9g 5.3g 4928 S 7.9 8.5 18:12.57 [ET_NET 44]
4457 ats 20 0 12.9g 5.3g 4928 S 5.3 8.5 7:32.34 [ET_NET 41]
4461 ats 20 0 12.9g 5.3g 4928 S 4.3 8.5 6:41.79 [ET_NET 45]
4453 ats 20 0 12.9g 5.3g 4928 S 3.0 8.5 6:44.52 [ET_NET 37]
4463 ats 20 0 12.9g 5.3g 4928 S 3.0 8.5 7:54.01 [ET_NET 47]
3652 root 39 19 0 0 0 S 1.0 0.0 2:14.15 kipmi0
4425 ats 20 0 12.9g 5.3g 4928 S 0.7 8.5 0:19.21 [ET_NET 9]
4426 ats 20 0 12.9g 5.3g 4928 S 0.7 8.5 0:19.42 [ET_NET 10]
4430 ats 20 0 12.9g 5.3g 4928 S 0.7 8.5 0:19.55 [ET_NET 14]
132 root 20 0 0 0 0 S 0.3 0.0 0:00.82 events/1
4417 ats 20 0 12.9g 5.3g 4928 S 0.3 8.5 0:19.82 [ET_NET 1]
4418 ats 20 0 12.9g 5.3g 4928 S 0.3 8.5 0:20.02 [ET_NET 2]
4420 ats 20 0 12.9g 5.3g 4928 S 0.3 8.5 0:19.64 [ET_NET 4]
4422 ats 20 0 12.9g 5.3g 4928 S 0.3 8.5 0:19.84 [ET_NET 6]
4427 ats 20 0 12.9g 5.3g 4928 S 0.3 8.5 0:19.94 [ET_NET 11]
4435 ats 20 0 12.9g 5.3g 4928 S 0.3 8.5 0:19.88 [ET_NET 19]
{code}
== Update ===
I have developed a patch to fix this issue. After I rebase it to master branch,
I'll attach it to here. Here is the description of my patch:
---------------
We found that CPU of logging thread could be easy to reach up 100% in
collation host, but disk IO was low at the same time.
The bottleneck of logging thread is that some preprocessing job, such as
convert LogBuffer to ascii text, consume so much CPU time. And more
worse, the write() operation will block logging thread.
So this patch try to split logging thread into two parts:
1) Configurable preproc threads, which are responsiable for processing all
of prepare work, and then forward the preprocessed buffer to flush thread,
or send them to CollationClient/HostSM.
2) One Flush thread, it will consume preprocessed buffers and write them to
disk. In our testing, one flush thread is enough for us.
TODO: This patch supports only one flush thread, we can improve it to
"one flush thread per file/disk" in the future.
== How to configure ==
The number of preproc threads is 1 by default.
Please modify "proxy.config.log.collation_preproc_threads" option to
change it.
---------------
======
After apply this patch, let's have a look the 'top' output:
{code}
[[email protected] core]# top -H -p 3984 -b -n 1
top - 15:39:06 up 17:23, 6 users, load average: 3.37, 3.09, 3.15
Tasks: 246 total, 6 running, 240 sleeping, 0 stopped, 0 zombie
Cpu(s): 1.4%us, 0.5%sy, 0.0%ni, 97.6%id, 0.4%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 65943744k total, 65763084k used, 180660k free, 136688k buffers
Swap: 2097144k total, 0k used, 2097144k free, 57275428k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4237 ats 20 0 8941m 5.2g 5148 R 39.5 8.3 80:46.23 [LOG_PREPROC 2]
4236 ats 20 0 8941m 5.2g 5148 R 37.5 8.3 80:46.53 [LOG_PREPROC 1]
4235 ats 20 0 8941m 5.2g 5148 R 35.6 8.3 80:46.71 [LOG_PREPROC 0]
4238 ats 20 0 8941m 5.2g 5148 R 35.6 8.3 80:48.37 [LOG_PREPROC 3]
4239 ats 20 0 8941m 5.2g 5148 R 35.6 8.3 80:46.58 [LOG_PREPROC 4]
4240 ats 20 0 8941m 5.2g 5148 R 11.9 8.3 20:45.28 [LOG_FLUSH]
4037 ats 20 0 8941m 5.2g 5148 S 7.9 8.3 15:06.60 [ET_NET 38]
4045 ats 20 0 8941m 5.2g 5148 S 5.9 8.3 9:55.74 [ET_NET 46]
4038 ats 20 0 8941m 5.2g 5148 S 4.0 8.3 6:08.97 [ET_NET 39]
4042 ats 20 0 8941m 5.2g 5148 S 4.0 8.3 8:36.73 [ET_NET 43]
...
{code}
[LOG_PREPROC 0-4] are collation preproc threads, [LOG_FLUSH] are new flush
threads. They have good throughput now.
> one log collation thread is not enough for busy system
> ------------------------------------------------------
>
> Key: TS-2089
> URL: https://issues.apache.org/jira/browse/TS-2089
> Project: Traffic Server
> Issue Type: Improvement
> Components: Logging
> Reporter: Zhao Yongming
> Fix For: 3.5.2
>
>
> we should make log collation more thread & cpu nice.
>
> {code}
> top - 22:34:22 up 3:37, 1 user, load average: 1.20, 1.71, 1.66
> Tasks: 979 total, 2 running, 977 sleeping, 0 stopped, 0 zombie
> Cpu(s): 3.5%us, 0.6%sy, 0.0%ni, 95.6%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
> Mem: 65943744k total, 62536348k used, 3407396k free, 90544k buffers
> Swap: 2097144k total, 0k used, 2097144k free, 54189788k cached
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 4651 ats 20 0 12.9g 5.3g 4928 R 100.1 8.5 197:51.78 [LOGGING]
> 4460 ats 20 0 12.9g 5.3g 4928 S 7.9 8.5 18:12.57 [ET_NET 44]
> 4457 ats 20 0 12.9g 5.3g 4928 S 5.3 8.5 7:32.34 [ET_NET 41]
> 4461 ats 20 0 12.9g 5.3g 4928 S 4.3 8.5 6:41.79 [ET_NET 45]
> 4453 ats 20 0 12.9g 5.3g 4928 S 3.0 8.5 6:44.52 [ET_NET 37]
> 4463 ats 20 0 12.9g 5.3g 4928 S 3.0 8.5 7:54.01 [ET_NET 47]
> 3652 root 39 19 0 0 0 S 1.0 0.0 2:14.15 kipmi0
> 4425 ats 20 0 12.9g 5.3g 4928 S 0.7 8.5 0:19.21 [ET_NET 9]
> 4426 ats 20 0 12.9g 5.3g 4928 S 0.7 8.5 0:19.42 [ET_NET 10]
> 4430 ats 20 0 12.9g 5.3g 4928 S 0.7 8.5 0:19.55 [ET_NET 14]
> 132 root 20 0 0 0 0 S 0.3 0.0 0:00.82 events/1
> 4417 ats 20 0 12.9g 5.3g 4928 S 0.3 8.5 0:19.82 [ET_NET 1]
> 4418 ats 20 0 12.9g 5.3g 4928 S 0.3 8.5 0:20.02 [ET_NET 2]
> 4420 ats 20 0 12.9g 5.3g 4928 S 0.3 8.5 0:19.64 [ET_NET 4]
> 4422 ats 20 0 12.9g 5.3g 4928 S 0.3 8.5 0:19.84 [ET_NET 6]
> 4427 ats 20 0 12.9g 5.3g 4928 S 0.3 8.5 0:19.94 [ET_NET 11]
> 4435 ats 20 0 12.9g 5.3g 4928 S 0.3 8.5 0:19.88 [ET_NET 19]
> {code}
> == Update ===
> I have developed a patch to fix this issue. After I rebase it to master
> branch, I'll attach it to here. Here is the description of my patch:
> ---------------
> We found that CPU of logging thread could be easy to reach up 100% in
> collation host, but disk IO was low at the same time.
> The bottleneck of logging thread is that some preprocessing job, such as
> convert LogBuffer to ascii text, consume so much CPU time. And more
> worse, the write() operation will block logging thread.
> So this patch try to split logging thread into two parts:
> 1) Configurable preproc threads, which are responsiable for processing all
> of prepare work, and then forward the preprocessed buffer to flush thread,
> or send them to CollationClient/HostSM.
> 2) One Flush thread, it will consume preprocessed buffers and write them to
> disk. In our testing, one flush thread is enough for us.
> TODO: This patch supports only one flush thread, we can improve it to
> "one flush thread per file/disk" in the future.
> == How to configure ==
> The number of preproc threads is 1 by default.
> Please modify "proxy.config.log.collation_preproc_threads" option to
> change it.
> ---------------
> ======
> After apply this patch, let's have a look the 'top' output:
> {code}
> [[email protected] core]# top -H -p 3984 -b -n 1
> top - 15:39:06 up 17:23, 6 users, load average: 3.37, 3.09, 3.15
> Tasks: 246 total, 6 running, 240 sleeping, 0 stopped, 0 zombie
> Cpu(s): 1.4%us, 0.5%sy, 0.0%ni, 97.6%id, 0.4%wa, 0.0%hi, 0.0%si, 0.0%st
> Mem: 65943744k total, 65763084k used, 180660k free, 136688k buffers
> Swap: 2097144k total, 0k used, 2097144k free, 57275428k cached
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 4237 ats 20 0 8941m 5.2g 5148 R 39.5 8.3 80:46.23 [LOG_PREPROC 2]
> 4236 ats 20 0 8941m 5.2g 5148 R 37.5 8.3 80:46.53 [LOG_PREPROC 1]
> 4235 ats 20 0 8941m 5.2g 5148 R 35.6 8.3 80:46.71 [LOG_PREPROC 0]
> 4238 ats 20 0 8941m 5.2g 5148 R 35.6 8.3 80:48.37 [LOG_PREPROC 3]
> 4239 ats 20 0 8941m 5.2g 5148 R 35.6 8.3 80:46.58 [LOG_PREPROC 4]
> 4240 ats 20 0 8941m 5.2g 5148 R 11.9 8.3 20:45.28 [LOG_FLUSH]
> 4037 ats 20 0 8941m 5.2g 5148 S 7.9 8.3 15:06.60 [ET_NET 38]
> 4045 ats 20 0 8941m 5.2g 5148 S 5.9 8.3 9:55.74 [ET_NET 46]
> 4038 ats 20 0 8941m 5.2g 5148 S 4.0 8.3 6:08.97 [ET_NET 39]
> 4042 ats 20 0 8941m 5.2g 5148 S 4.0 8.3 8:36.73 [ET_NET 43]
> ...
> {code}
> [LOG_PREPROC 0-4] are collation preproc threads, [LOG_FLUSH] is new flush
> thread. They have good throughput now.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira