On 2015/1/27 13:31, Jérémie Galarneau wrote: > On Mon, Jan 26, 2015 at 7:54 PM, Wang Nan <[email protected]> wrote: >> On 2015/1/27 0:25, Jérémie Galarneau wrote: >>> On Fri, Jan 23, 2015 at 4:35 AM, Wang Nan <[email protected]> wrote: >>>> Hi folks, >>>> >>>> I'd like to share my first tuning experience with perf, ctf and >>>> TraceCompass here, and I hope my experience helpful to diamon.org. Most >>>> part of this mail is talking about my work. If you don't >>>> interest in it, you can directly jump to conclusion part. >>>> >>>> *My Task* >>>> >>>> What I'm working on is finding the reason why CPU idle rate is high when >>>> we benchmarking a database. I think it should be a very simple task: >>>> tracing scheduling and system calls, finding the previous syscall issued >>>> before idle, then based on statistics, collecting some user spaces call >>>> stack, I can give an answer. I use perf to collect trace, >>>> perf-convert-to-ctf to get ctf output and TraceCompass for >>>> visualization. >>>> >>>> >>>> *My Experience* >>>> >>>> First of all I use perf to collect trace: >>>> >>>> # perf record -a -e sched:* -e raw_syscalls:* sleep 1 >>>> >>>> then >>>> >>>> # perf data convert --to-ctf out.ctf >>>> >>>> Which is simple. However, raw_syscalls:* tracepoints export less >>>> information than syscalls:* tracepoints. Without them I have to manually >>>> find syscall name from syscall id. I prefer to use: >>>> >>>> # perf record -a -e sched:* -e syscalls:* sleep 1 >>>> >>>> However there are some bugs and I have to make some patches. They are >>>> posted and being disscussed currently, those bugs are still exist >>>> upstream. >>>> >>>> Then I need to convert perf.data to ctf. It tooks 140.57s to convert >>>> 2598513 samples, which are collected during only 1 second execution. My >>>> working server has 64 2.0GHz Intel Xeon cores, but perf conversion >>>> utilizes only 1 of them. I think this is another thing can be improved. >>> >>> Thanks for taking the time to do this write-up! >>> >>> Would it be possible to profile perf-to-ctf so we can spot the bottleneck? >>> >>> Regards, >>> Jérémie >>> >> >> We should, but I don't have enough time to do this. In addition, I think >> perf-to-ctf >> conversion is possible to be parallelized without too much work. I suggest >> to consider >> code tuning after we done that at perf side. What do you think? > > I'm not sure I agree with the "without too much work" statement since > Babeltrace's CTF Writer API is not thread-safe at this moment. As > Alexandre previously mentionned, the size of the resulting CTF trace > could be a good indication of whether or not IO is the culprit here. > > We have never done any optimization work on Babeltrace and CTF Writer. > We could be smarter on a number of front, notably with regards to > memory allocation; there is currently no pooling of allocated objects > (events, fields, types, etc.). We'll have to take a look at hard data > before getting our hands dirty. However, I'm sure there are tons of > optimization opportunities. > > Jérémie >
Agree. I'll try to tune it when I have time, then we can get another tuning experience. Thank you. >> >>>> >>>> The next step is visualization. Output ctf trace can be opened with >>>> TraceCompass without problem. The most important views for me should be >>>> resources view (I use them to check CPU usage) and control flow view (I >>>> use them to check thread activities). >>>> >>>> The first uncomfortable thing is TraceCompass' slow response time. For >>>> the trace I mentioned above, on resource view, after I click on CPU >>>> idle area, I have to wait more than 10 seconds for event list updating >>>> to get the previous event before the idle area. >>>> >>>> Then I found through resources view that perf itself tooks lots of CPU >>>> time. In my case 33.5% samples are generated by perf itself. One core is >>>> dedicated to perf and never idle or taken by others. I think this should >>>> be another thing needs to be improved: perf should give a way to >>>> blacklist itself when tracing all CPUs. >>>> >>>> TraceCompass doesn't recognize syscall:* tracepoints as CPU status >>>> changing point. I have to also catch raw_syscall:*, and which doubles >>>> the number of samples. >>>> >>>> Finally I found the syscall which cause idle. However I need to write a >>>> script to do statistics. TraceCompass itself is lack a mean to count >>>> different events in my way. >>>> >>>> The next thing I should do is to find the calltrace which issue the >>>> syscall. This time TraceCompass won't help, mostly because perf >>>> convertion now doesn't support converting calltrace. >>>> >>>> *Conclusion* >>>> >>>> I suggest perf and TraceCompass to think about following improvements: >>>> >>>> 1. Reducing the cost of perf recording. There are one third events are >>>> generated by perf itself in my case. Is it possible that perf could >>>> provide an ability that blacklist itself and collect all other >>>> events? >>>> >>>> 2. Improving perf converting performance. Converting perf.data to CTF is >>>> slow, but it should be offline most of the time. We can utilize the >>>> abilities multi-core server to make it working in parallel. >>>> >>>> 3. Improving TraceCompass responding performance, especially when >>>> synchronizing different views. >>>> >>>> 4. Support converting userspace call trace. I think perf side should >>>> already >>>> have a plan on it. >>>> >>>> 5. Ad-Hoc visualization and statistics. Currently TraceCompass only >>>> support dwaring pre-defined events and processes. When I try to >>>> capture syscalls:*, I won't get benefit from TraceCompass because it >>>> doesn't know them. I believe that during system tuning we will >>>> finally get somewhere unable to be pre-defined by TraceCompass >>>> designer. Therefore give users abilities to define their own events >>>> and model should be much helpful. >>>> >>>> Thank you. >>>> >>>> _______________________________________________ >>>> diamon-discuss mailing list >>>> [email protected] >>>> https://lists.linuxfoundation.org/mailman/listinfo/diamon-discuss >>> >>> >>> >> >> > > > _______________________________________________ lttng-dev mailing list [email protected] http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
