On 2015/1/27 0:25, Jérémie Galarneau wrote: > On Fri, Jan 23, 2015 at 4:35 AM, Wang Nan <[email protected]> wrote: >> Hi folks, >> >> I'd like to share my first tuning experience with perf, ctf and >> TraceCompass here, and I hope my experience helpful to diamon.org. Most >> part of this mail is talking about my work. If you don't >> interest in it, you can directly jump to conclusion part. >> >> *My Task* >> >> What I'm working on is finding the reason why CPU idle rate is high when >> we benchmarking a database. I think it should be a very simple task: >> tracing scheduling and system calls, finding the previous syscall issued >> before idle, then based on statistics, collecting some user spaces call >> stack, I can give an answer. I use perf to collect trace, >> perf-convert-to-ctf to get ctf output and TraceCompass for >> visualization. >> >> >> *My Experience* >> >> First of all I use perf to collect trace: >> >> # perf record -a -e sched:* -e raw_syscalls:* sleep 1 >> >> then >> >> # perf data convert --to-ctf out.ctf >> >> Which is simple. However, raw_syscalls:* tracepoints export less >> information than syscalls:* tracepoints. Without them I have to manually >> find syscall name from syscall id. I prefer to use: >> >> # perf record -a -e sched:* -e syscalls:* sleep 1 >> >> However there are some bugs and I have to make some patches. They are >> posted and being disscussed currently, those bugs are still exist >> upstream. >> >> Then I need to convert perf.data to ctf. It tooks 140.57s to convert >> 2598513 samples, which are collected during only 1 second execution. My >> working server has 64 2.0GHz Intel Xeon cores, but perf conversion >> utilizes only 1 of them. I think this is another thing can be improved. > > Thanks for taking the time to do this write-up! > > Would it be possible to profile perf-to-ctf so we can spot the bottleneck? > > Regards, > Jérémie >
We should, but I don't have enough time to do this. In addition, I think perf-to-ctf conversion is possible to be parallelized without too much work. I suggest to consider code tuning after we done that at perf side. What do you think? >> >> The next step is visualization. Output ctf trace can be opened with >> TraceCompass without problem. The most important views for me should be >> resources view (I use them to check CPU usage) and control flow view (I >> use them to check thread activities). >> >> The first uncomfortable thing is TraceCompass' slow response time. For >> the trace I mentioned above, on resource view, after I click on CPU >> idle area, I have to wait more than 10 seconds for event list updating >> to get the previous event before the idle area. >> >> Then I found through resources view that perf itself tooks lots of CPU >> time. In my case 33.5% samples are generated by perf itself. One core is >> dedicated to perf and never idle or taken by others. I think this should >> be another thing needs to be improved: perf should give a way to >> blacklist itself when tracing all CPUs. >> >> TraceCompass doesn't recognize syscall:* tracepoints as CPU status >> changing point. I have to also catch raw_syscall:*, and which doubles >> the number of samples. >> >> Finally I found the syscall which cause idle. However I need to write a >> script to do statistics. TraceCompass itself is lack a mean to count >> different events in my way. >> >> The next thing I should do is to find the calltrace which issue the >> syscall. This time TraceCompass won't help, mostly because perf >> convertion now doesn't support converting calltrace. >> >> *Conclusion* >> >> I suggest perf and TraceCompass to think about following improvements: >> >> 1. Reducing the cost of perf recording. There are one third events are >> generated by perf itself in my case. Is it possible that perf could >> provide an ability that blacklist itself and collect all other >> events? >> >> 2. Improving perf converting performance. Converting perf.data to CTF is >> slow, but it should be offline most of the time. We can utilize the >> abilities multi-core server to make it working in parallel. >> >> 3. Improving TraceCompass responding performance, especially when >> synchronizing different views. >> >> 4. Support converting userspace call trace. I think perf side should already >> have a plan on it. >> >> 5. Ad-Hoc visualization and statistics. Currently TraceCompass only >> support dwaring pre-defined events and processes. When I try to >> capture syscalls:*, I won't get benefit from TraceCompass because it >> doesn't know them. I believe that during system tuning we will >> finally get somewhere unable to be pre-defined by TraceCompass >> designer. Therefore give users abilities to define their own events >> and model should be much helpful. >> >> Thank you. >> >> _______________________________________________ >> diamon-discuss mailing list >> [email protected] >> https://lists.linuxfoundation.org/mailman/listinfo/diamon-discuss > > > _______________________________________________ lttng-dev mailing list [email protected] http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
