On Thursday 18 September 2014 17:36:25 Arnaldo Carvalho de Melo wrote: > Em Thu, Sep 18, 2014 at 02:17:49PM -0600, David Ahern escreveu: > > On 9/18/14, 1:17 PM, Arnaldo Carvalho de Melo wrote: > > >>This was also why I asked my initial question, which I want to repeat > > >>once > > >> > > >>>more: Is there a technical reason to not offer a "timer" software event > > >>>to > > >>>perf? I'm a complete layman when it comes to Kernel internals, but from > > >>>a user point of view this would be awesome: > > >>> > > >>>perf record --call-graph dwarf -e sw-timer -F 100 someapplication > > >>> > > >>>This command would then create a timer in the kernel with a 100Hz > > >>>frequency. Whenever it fires, the callgraphs of all threads in > > >>>$someapplication are sampled and written to perf.data. Is this > > >>>technically not feasible? Or is it simply not implemented? > > >>>I'm experimenting with a libunwind based profiler, and with some ugly > > >>>signal hackery I can now grab backtraces by sending my application > > >>>SIGUSR1. Based on> > > > >Humm, can't you do the same thing with perf? I.e. you send SIGUSR1 to > > >your app with the frequency you want, and then hook a 'perf probe' into > > >your signal... /me tries some stuff, will get back with results...
That is actually a very good idea. With the more powerful scripting abilities in perf now, that could/should do the job indeed. I'll also try this out. > > Current profiling options with perf require the process to be running. > > What > > Ok, so you want to see what is the wait channel and unwind the stack > from there? Is that the case? I.e. again, a sysrq-t equivalent? Hey again :) If I'm not mistaken, I could not yet bring my point across. The final goal is to profile both, wait time _and_ CPU time, combined! By sampling a userspace program with a constant frequency and some statistics one can get extremely useful information out of the data. I want to use perf to automate the process that Mike Dunlavey explains here: http://stackoverflow.com/a/378024/35250 So yes, I want to see the wait channel and unwind from there, if the process is waiting. Otherwise just do the normal unwinding you'd do when any other perf event occurs. > > Milian want is to grab samples every timer expiration even if process is > > not running. > > What for? And by "grab samples" you want to know where it is waiting for > something, together with its callchain? See above. If I can sample the callchain every N ms, preferrably in a per- thread manner, I can create tools which find the slowest userspace functions. This is based on "inclusive" cost, which is made up out of CPU time _and_ wait time combined. With it one will find all of the following: - CPU hotspots - IO wait time - lock contention in a multi-threaded application > > Any limitations that would prevent doing this with a sw event? e.g, mimic > > task-clock just don't disable the timer when the task is scheduled out. > > I'm just trying to figure out if what people want is a complete > backtrace of all threads in a process, no matter what they are doing, at > some given time, i.e. at "sysrq-t" time, is that the case? Yes, that sounds correct to me. I tried it out on my system, but the output in dmesg is far too large and I could not find the call stack information of my test application while it slept. Looking at the other backtraces I see there, I'm not sure whether sysrq-t only outputs a kernel-backtrace? Or maybe it's because I'm missing debug symbols for these applications? [167576.049113] mysqld S 0000000000000000 0 17023 1 0x00000000 [167576.049115] ffff8801222a7ad0 0000000000000082 ffff8800aa8c9460 0000000000014580 [167576.049117] ffff8801222a7fd8 0000000000014580 ffff8800aa8c9460 ffff8801222a7a18 [167576.049119] ffffffff812a2640 ffff8800bb975c40 ffff880237cd4ef0 ffff8801222a7b20 [167576.049121] Call Trace: [167576.049124] [<ffffffff812a2640>] ? cpumask_next_and+0x30/0x50 [167576.049126] [<ffffffff810b65b4>] ? add_wait_queue+0x44/0x50 [167576.049128] [<ffffffff8152ceb9>] schedule+0x29/0x70 [167576.049130] [<ffffffff8152c4ad>] schedule_hrtimeout_range_clock+0x3d/0x60 [167576.049132] [<ffffffff8152c4e3>] schedule_hrtimeout_range+0x13/0x20 [167576.049134] [<ffffffff811d5589>] poll_schedule_timeout+0x49/0x70 [167576.049136] [<ffffffff811d6d5e>] do_sys_poll+0x4be/0x570 [167576.049138] [<ffffffff81155880>] ? __alloc_pages_nodemask+0x180/0xc20 [167576.049140] [<ffffffff8108e3ae>] ? alloc_pid+0x2e/0x4c0 [167576.049142] [<ffffffff811d5700>] ? poll_select_copy_remaining+0x150/0x150 [167576.049145] [<ffffffff810bd1f0>] ? cpuacct_charge+0x50/0x60 [167576.049147] [<ffffffff811a5de3>] ? kmem_cache_alloc+0x143/0x170 [167576.049149] [<ffffffff8108e3ae>] ? alloc_pid+0x2e/0x4c0 [167576.049151] [<ffffffff810a7d08>] ? __enqueue_entity+0x78/0x80 [167576.049153] [<ffffffff810addee>] ? enqueue_entity+0x24e/0xaa0 [167576.049155] [<ffffffff812a2640>] ? cpumask_next_and+0x30/0x50 [167576.049158] [<ffffffff810ae74d>] ? enqueue_task_fair+0x10d/0x5b0 [167576.049160] [<ffffffff81045c3d>] ? native_smp_send_reschedule+0x4d/0x70 [167576.049162] [<ffffffff8109e740>] ? resched_task+0xc0/0xd0 [167576.049165] [<ffffffff8109fff7>] ? check_preempt_curr+0x57/0x90 [167576.049167] [<ffffffff810a2e2b>] ? wake_up_new_task+0x11b/0x1c0 [167576.049169] [<ffffffff8106d5c6>] ? do_fork+0x136/0x3f0 [167576.049171] [<ffffffff811df235>] ? __fget_light+0x25/0x70 [167576.049173] [<ffffffff811d6f37>] SyS_poll+0x97/0x120 [167576.049176] [<ffffffff81530d69>] ? system_call_fastpath+0x16/0x1b [167576.049178] [<ffffffff81530d69>] system_call_fastpath+0x16/0x1b I'm interested in the the other side of the fence. The backtrace could/should stop at system_call_fastpath. Bye -- Milian Wolff m...@milianw.de http://milianw.de -- To unsubscribe from this list: send the line "unsubscribe linux-perf-users" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html