Re: [PATCH v2 0/7] perf trace pagefaults
Em Mon, Jun 23, 2014 at 03:41:39PM +0400, Stanislav Fomichev escreveu: > > > But we then need to predefine many probes for decoding to work in the > > > form of > > > func:offset, and then play catch-up with all the kernel changes. > > > Or I miss something important here? > > No you don't. > > If we want to disturb the system in the least way possible, we need to > > tag along the copying from userspace of those pointers, so that we get > > them fresh and just stash it in our ring buffer and get out of the way > > quickly. > I just thought maybe you have some grand plan in mind about automagically > adding probes so argument tracing works transparently. I like the > approach though. First we use what we have in place, then we optimize it. > > Almost a year ago, and it still works, now lets see the cset you mention... > > > > [acme@zoo linux]$ git describe c4ad8f98bef77c7356aa6a9ad9188a6acc6b849d > > v3.14-rc1-14-gc4ad8f98bef7 > > [acme@zoo linux]$ > > [root@zoo ~]# uname -r > > 3.15.0-rc8+ > > > > Humm, what is the problem? > I thought that result->name was actually set on 65th line of > getname_flags, so the above commit would move it to 66th. But it's not > the case, sorry for confusion. > > [1] And I feel like all of tools/perf/ is just that, reference > > implementations, but hopefully > > done in a such a way that may well be useful as-is :-) > I'd like perf to be a goto tool for all kind of performance analysis, yay, and you're working for that, thanks! > not just a reference implementation. I believe nobody looks at this > reference, and we end up with tools like https://github.com/draios/sysdig Never heard about it, will take a look, thanks for the pointer. > which do their own events, ring buffer, etc. There are several out there :) - Arnaldo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 0/7] perf trace pagefaults
Hi Arnaldo and Stanislav, On Fri, 20 Jun 2014 10:21:05 -0300, Arnaldo Carvalho de Melo wrote: > Em Fri, Jun 20, 2014 at 02:49:42PM +0400, Stanislav Fomichev escreveu: >> This patch series adds support for pagefaults tracing to 'perf trace' >> command. >> It seems this feature was planned by Namhyung Kim >> (http://events.linuxfoundation.org/images/stories/pdf/klf2012_n_kim.pdf page >> 17/28) >> but I couldn't find any prior patches/discussion and started from scratch. > > Just to clarify here, those slides came from slides I made and in turn > the whole idea about pagefaults tracing I got from the trace prototype > that Thomas Gleixner implemented in his 'trace' utility, described > here: > > Announcing a new utility: 'trace' > http://lwn.net/Articles/415728/ Right, I asked to Arnaldo to suggest some cool topics to introduce in KLF 2012 and that was it. I had nothing with the features. :) Keep going nice works, guys! Thanks, Namhyung -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 0/7] perf trace pagefaults
Hi Arnaldo and Stanislav, On Fri, 20 Jun 2014 10:21:05 -0300, Arnaldo Carvalho de Melo wrote: Em Fri, Jun 20, 2014 at 02:49:42PM +0400, Stanislav Fomichev escreveu: This patch series adds support for pagefaults tracing to 'perf trace' command. It seems this feature was planned by Namhyung Kim (http://events.linuxfoundation.org/images/stories/pdf/klf2012_n_kim.pdf page 17/28) but I couldn't find any prior patches/discussion and started from scratch. Just to clarify here, those slides came from slides I made and in turn the whole idea about pagefaults tracing I got from the trace prototype that Thomas Gleixner implemented in his 'trace' utility, described here: Announcing a new utility: 'trace' http://lwn.net/Articles/415728/ Right, I asked to Arnaldo to suggest some cool topics to introduce in KLF 2012 and that was it. I had nothing with the features. :) Keep going nice works, guys! Thanks, Namhyung -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 0/7] perf trace pagefaults
Em Mon, Jun 23, 2014 at 03:41:39PM +0400, Stanislav Fomichev escreveu: But we then need to predefine many probes for decoding to work in the form of func:offset, and then play catch-up with all the kernel changes. Or I miss something important here? No you don't. If we want to disturb the system in the least way possible, we need to tag along the copying from userspace of those pointers, so that we get them fresh and just stash it in our ring buffer and get out of the way quickly. I just thought maybe you have some grand plan in mind about automagically adding probes so argument tracing works transparently. I like the approach though. First we use what we have in place, then we optimize it. Almost a year ago, and it still works, now lets see the cset you mention... [acme@zoo linux]$ git describe c4ad8f98bef77c7356aa6a9ad9188a6acc6b849d v3.14-rc1-14-gc4ad8f98bef7 [acme@zoo linux]$ [root@zoo ~]# uname -r 3.15.0-rc8+ Humm, what is the problem? I thought that result-name was actually set on 65th line of getname_flags, so the above commit would move it to 66th. But it's not the case, sorry for confusion. [1] And I feel like all of tools/perf/ is just that, reference implementations, but hopefully done in a such a way that may well be useful as-is :-) I'd like perf to be a goto tool for all kind of performance analysis, yay, and you're working for that, thanks! not just a reference implementation. I believe nobody looks at this reference, and we end up with tools like https://github.com/draios/sysdig Never heard about it, will take a look, thanks for the pointer. which do their own events, ring buffer, etc. There are several out there :) - Arnaldo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 0/7] perf trace pagefaults
On 6/20/14, 9:24 AM, Arnaldo Carvalho de Melo wrote: Right now it is too simple, but I was starting to work (when you jumped right in with your work making me stop and go on testing/reviewing :) ) on making it more generic so that we could defer pretty printing the arguments from sys_enter to sys_exit, when, by then, we would already have an association of a user level pointer in some specific thread to its contents. This will allow us to to resolve the pathname pointer in things like open() (i.e. not just after that, in the fd syscalls (write, etc)) and as well any other pointer of interest. By librarizing 'builtin-probe.c', that now uses lots of global variables, etc, we would be able to insert probes where we want them to capture the contents of pointers, check if the probes are already in place, use just the ones that we managed to insert (i.e. that were not invalid because the places where we wanted them to be were changed across kernel releases, etc). I.e. no need for actual tracepoints from day one, just wannabe tracepoints using whatever probe inserting gizmo the kprobes_tracer used by 'perf probe' now thinks its best to use. Combine that with using DWARF descriptions (that could be pre cached into something like CTF (the DTrace kind of CTF) or similar) like pahole does and we would mostly automatically do all this work of prettyfing syscall parameters. That was so much handwaving you could keep cool at a World Cup game. :-) David -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 0/7] perf trace pagefaults
> > But we then need to predefine many probes for decoding to work in the form > > of > > func:offset, and then play catch-up with all the kernel changes. > > Or I miss something important here? > > No you don't. > > If we want to disturb the system in the least way possible, we need to > tag along the copying from userspace of those pointers, so that we get > them fresh and just stash it in our ring buffer and get out of the way > quickly. I just thought maybe you have some grand plan in mind about automagically adding probes so argument tracing works transparently. I like the approach though. > Almost a year ago, and it still works, now lets see the cset you mention... > > [acme@zoo linux]$ git describe c4ad8f98bef77c7356aa6a9ad9188a6acc6b849d > v3.14-rc1-14-gc4ad8f98bef7 > [acme@zoo linux]$ > [root@zoo ~]# uname -r > 3.15.0-rc8+ > > Humm, what is the problem? I thought that result->name was actually set on 65th line of getname_flags, so the above commit would move it to 66th. But it's not the case, sorry for confusion. > [1] And I feel like all of tools/perf/ is just that, reference > implementations, but hopefully > done in a such a way that may well be useful as-is :-) I'd like perf to be a goto tool for all kind of performance analysis, not just a reference implementation. I believe nobody looks at this reference, and we end up with tools like https://github.com/draios/sysdig which do their own events, ring buffer, etc. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 0/7] perf trace pagefaults
On 6/20/14, 9:24 AM, Arnaldo Carvalho de Melo wrote: Right now it is too simple, but I was starting to work (when you jumped right in with your work making me stop and go on testing/reviewing :) ) on making it more generic so that we could defer pretty printing the arguments from sys_enter to sys_exit, when, by then, we would already have an association of a user level pointer in some specific thread to its contents. This will allow us to to resolve the pathname pointer in things like open() (i.e. not just after that, in the fd syscalls (write, etc)) and as well any other pointer of interest. By librarizing 'builtin-probe.c', that now uses lots of global variables, etc, we would be able to insert probes where we want them to capture the contents of pointers, check if the probes are already in place, use just the ones that we managed to insert (i.e. that were not invalid because the places where we wanted them to be were changed across kernel releases, etc). I.e. no need for actual tracepoints from day one, just wannabe tracepoints using whatever probe inserting gizmo the kprobes_tracer used by 'perf probe' now thinks its best to use. Combine that with using DWARF descriptions (that could be pre cached into something like CTF (the DTrace kind of CTF) or similar) like pahole does and we would mostly automatically do all this work of prettyfing syscall parameters. /handwave That was so much handwaving you could keep cool at a World Cup game. :-) David -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 0/7] perf trace pagefaults
But we then need to predefine many probes for decoding to work in the form of func:offset, and then play catch-up with all the kernel changes. Or I miss something important here? No you don't. If we want to disturb the system in the least way possible, we need to tag along the copying from userspace of those pointers, so that we get them fresh and just stash it in our ring buffer and get out of the way quickly. I just thought maybe you have some grand plan in mind about automagically adding probes so argument tracing works transparently. I like the approach though. Almost a year ago, and it still works, now lets see the cset you mention... [acme@zoo linux]$ git describe c4ad8f98bef77c7356aa6a9ad9188a6acc6b849d v3.14-rc1-14-gc4ad8f98bef7 [acme@zoo linux]$ [root@zoo ~]# uname -r 3.15.0-rc8+ Humm, what is the problem? I thought that result-name was actually set on 65th line of getname_flags, so the above commit would move it to 66th. But it's not the case, sorry for confusion. [1] And I feel like all of tools/perf/ is just that, reference implementations, but hopefully done in a such a way that may well be useful as-is :-) I'd like perf to be a goto tool for all kind of performance analysis, not just a reference implementation. I believe nobody looks at this reference, and we end up with tools like https://github.com/draios/sysdig which do their own events, ring buffer, etc. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 0/7] perf trace pagefaults
Em Fri, Jun 20, 2014 at 08:18:59PM +0400, Stanislav Fomichev escreveu: > > Hey, haven't you seen the vfs_getname probe? Idea is to hook on where > > the relevant copy_from_user is done and insert that into the ring > > buffer, as we already do for mapping fd -> pathname. > I saw it but didn't actually try because it needs all the debugging > stuff enabled and in place. Touché, more on that below... > > I.e. no need for actual tracepoints from day one, just wannabe > > tracepoints using whatever probe inserting gizmo the kprobes_tracer used > > by 'perf probe' now thinks its best to use. > But we then need to predefine many probes for decoding to work in the form of > func:offset, and then play catch-up with all the kernel changes. > Or I miss something important here? No you don't. If we want to disturb the system in the least way possible, we need to tag along the copying from userspace of those pointers, so that we get them fresh and just stash it in our ring buffer and get out of the way quickly. > > For now try: > > > > perf probe 'vfs_getname=getname_flags:65 pathname=result->name:string' > > trace > > > > And look at how it manages to decode fds. > I will try, but does 65 still work after > c4ad8f98bef77c7356aa6a9ad9188a6acc6b849d? :-) Well, when I prototyped this[1] the idea is that in some areas, there is not that much code flux that before commiting to any kind of new interface, be it tracepoints or something else, we may well just use 'perf probe' to get what we need, and this was done in... commit 75b757ca90469e990e6901f4a9497fe4161f7f5a Author: Arnaldo Carvalho de Melo Date: Tue Sep 24 11:04:32 2013 -0300 Almost a year ago, and it still works, now lets see the cset you mention... [acme@zoo linux]$ git describe c4ad8f98bef77c7356aa6a9ad9188a6acc6b849d v3.14-rc1-14-gc4ad8f98bef7 [acme@zoo linux]$ [root@zoo ~]# uname -r 3.15.0-rc8+ Humm, what is the problem? [root@zoo ~]# perf probe -V getname_flags:65 Available variables at getname_flags:65 @ char* filename int len int*empty long intmax struct filename*result [root@zoo ~]# [root@zoo ~]# perf probe 'vfs_getname=getname_flags:65 pathname=result->name:string' Added new event: probe:vfs_getname(on getname_flags:65 with pathname=result->name:string) You can now use it in all perf tools, such as: perf record -e probe:vfs_getname -aR sleep 1 [root@zoo ~]# perf record -e probe:vfs_getname -aR sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 1.133 MB perf.data (~49505 samples) ] [root@zoo ~]# perf evlist probe:vfs_getname [root@zoo ~]# perf evlist -v probe:vfs_getname: sample_freq=1, type: 2, config: 1317, size: 96, sample_type: IP|TID|TIME|CPU|PERIOD|RAW, disabled: 1, inherit: 1, mmap: 1, comm: 1, sample_id_all: 1, exclude_guest: 1 [root@zoo ~]# perf script perf 11255 [003] 156054.623210: probe:vfs_getname: (811c2e43) pathname="/home/acme/libexec/perf-core/sleep" perf 11255 [003] 156054.624759: probe:vfs_getname: (811c2e43) pathname="/usr/lib64/qt-3.3/bin/sleep" perf 11255 [003] 156054.624782: probe:vfs_getname: (811c2e43) pathname="/usr/lib64/ccache/sleep" perf 11255 [003] 156054.624794: probe:vfs_getname: (811c2e43) pathname="/usr/local/sbin/sleep" perf 11255 [003] 156054.624809: probe:vfs_getname: (811c2e43) pathname="/usr/local/bin/sleep" perf 11255 [003] 156054.624818: probe:vfs_getname: (811c2e43) pathname="/sbin/sleep" perf 11255 [003] 156054.625017: probe:vfs_getname: (811c2e43) pathname="/bin/sleep" sleep 11255 [002] 156054.626093: probe:vfs_getname: (811c2e43) pathname="/etc/ld.so.preload" sleep 11255 [002] 156054.626114: probe:vfs_getname: (811c2e43) pathname="/etc/ld.so.cache" sleep 11255 [002] 156054.626159: probe:vfs_getname: (811c2e43) pathname="/lib64/libc.so.6" sleep 11255 [002] 156054.626751: probe:vfs_getname: (811c2e43) pathname="/usr/lib/locale/locale-archive" goa-daemon 2082 [003] 156054.955138: probe:vfs_getname: (811c2e43) pathname="/etc/localtime" goa-daemon 2082 [003] 156054.955573: probe:vfs_getname: (811c2e43) pathname="/etc/localtime" [root@zoo ~]# Best possible way to do this? Guess not, but I'm looking from a tooling perspective, i.e. about using what is available, not about adding requirements to the kernel or toolchain, that we can do after we prototype in the best way possible with existing facilities. - Arnaldo [1] And I feel like all of tools/perf/ is just that, reference implementations, but hopefully done in a such a way that may well be useful as-is :-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a
Re: [PATCH v2 0/7] perf trace pagefaults
> Hey, haven't you seen the vfs_getname probe? Idea is to hook on where > the relevant copy_from_user is done and insert that into the ring > buffer, as we already do for mapping fd -> pathname. I saw it but didn't actually try because it needs all the debugging stuff enabled and in place. > I.e. no need for actual tracepoints from day one, just wannabe > tracepoints using whatever probe inserting gizmo the kprobes_tracer used > by 'perf probe' now thinks its best to use. But we then need to predefine many probes for decoding to work in the form of func:offset, and then play catch-up with all the kernel changes. Or I miss something important here? > For now try: > > perf probe 'vfs_getname=getname_flags:65 pathname=result->name:string' > trace > > And look at how it manages to decode fds. I will try, but does 65 still work after c4ad8f98bef77c7356aa6a9ad9188a6acc6b849d? :-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 0/7] perf trace pagefaults
Em Fri, Jun 20, 2014 at 07:03:18PM +0400, Stanislav Fomichev escreveu: > > Just to clarify here, those slides came from slides I made and in turn > > the whole idea about pagefaults tracing I got from the trace prototype > > that Thomas Gleixner implemented in his 'trace' utility, described > > here: > > Announcing a new utility: 'trace' > > http://lwn.net/Articles/415728/ > > The comments section has lots of interesting ideas, some you may find > > interesting to implement :-) > > There is a branch in my tree with the branch tglx did his work on: > > https://git.kernel.org/cgit/linux/kernel/git/acme/linux.git/log/?h=tmp.perf/trace2 > Wow, thanks, I tried to search lkml for any presence of > patches/discussion about these slides, but couldn't find anything, thanks for > pointing it out. > I really like 'blocking/preempted' indication and of course I miss > pointers decoding. > Did anyone really think about decoding pointers and how we can > implement it (like dumping them upon entering a syscall and then > using inside the perf trace?)? Hey, haven't you seen the vfs_getname probe? Idea is to hook on where the relevant copy_from_user is done and insert that into the ring buffer, as we already do for mapping fd -> pathname. Right now it is too simple, but I was starting to work (when you jumped right in with your work making me stop and go on testing/reviewing :) ) on making it more generic so that we could defer pretty printing the arguments from sys_enter to sys_exit, when, by then, we would already have an association of a user level pointer in some specific thread to its contents. This will allow us to to resolve the pathname pointer in things like open() (i.e. not just after that, in the fd syscalls (write, etc)) and as well any other pointer of interest. By librarizing 'builtin-probe.c', that now uses lots of global variables, etc, we would be able to insert probes where we want them to capture the contents of pointers, check if the probes are already in place, use just the ones that we managed to insert (i.e. that were not invalid because the places where we wanted them to be were changed across kernel releases, etc). I.e. no need for actual tracepoints from day one, just wannabe tracepoints using whatever probe inserting gizmo the kprobes_tracer used by 'perf probe' now thinks its best to use. Combine that with using DWARF descriptions (that could be pre cached into something like CTF (the DTrace kind of CTF) or similar) like pahole does and we would mostly automatically do all this work of prettyfing syscall parameters. :-) For now try: perf probe 'vfs_getname=getname_flags:65 pathname=result->name:string' trace And look at how it manages to decode fds. - Arnaldo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 0/7] perf trace pagefaults
> Just to clarify here, those slides came from slides I made and in turn > the whole idea about pagefaults tracing I got from the trace prototype > that Thomas Gleixner implemented in his 'trace' utility, described > here: > > Announcing a new utility: 'trace' > http://lwn.net/Articles/415728/ > > The comments section has lots of interesting ideas, some you may find > interesting to implement :-) > > There is a branch in my tree with the branch tglx did his work on: > > https://git.kernel.org/cgit/linux/kernel/git/acme/linux.git/log/?h=tmp.perf/trace2 Wow, thanks, I tried to search lkml for any presence of patches/discussion about these slides, but couldn't find anything, thanks for pointing it out. I really like 'blocking/preempted' indication and of course I miss pointers decoding. Did anyone really think about decoding pointers and how we can implement it (like dumping them upon entering a syscall and then using inside the perf trace?)? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 0/7] perf trace pagefaults
Em Fri, Jun 20, 2014 at 02:49:42PM +0400, Stanislav Fomichev escreveu: > This patch series adds support for pagefaults tracing to 'perf trace' command. > It seems this feature was planned by Namhyung Kim > (http://events.linuxfoundation.org/images/stories/pdf/klf2012_n_kim.pdf page > 17/28) > but I couldn't find any prior patches/discussion and started from scratch. Just to clarify here, those slides came from slides I made and in turn the whole idea about pagefaults tracing I got from the trace prototype that Thomas Gleixner implemented in his 'trace' utility, described here: Announcing a new utility: 'trace' http://lwn.net/Articles/415728/ The comments section has lots of interesting ideas, some you may find interesting to implement :-) There is a branch in my tree with the branch tglx did his work on: https://git.kernel.org/cgit/linux/kernel/git/acme/linux.git/log/?h=tmp.perf/trace2 There you can take a look and compare what you're doing to what he did. Now I'll go thru your current patches and will cherry pick whatever I think its OK already, and will try and provide comments for whatever I think needs more work. - Arnaldo > First three patches add the feature and options to enable faults and disable > syscalls. > Two last patches add events caching (like it's done in the perf kvm), so that > we don't get fault events prior to mmap/comm events (makes sense only > for live mode). > > This is just a proof-of-concept, and I'd like to get some comments about > where and what I got wrong and what additional useful information I can > expose in the trace. > > v2: > - added more info to the changelogs > - reworked options (-f -> -F, --pgfaults -> --pf=[all|min|maj]) > - separated tracepoint_handler changes into additional patch > - separated record/replay into additional patch > - other fixes pointed out by Arnaldo Carvalho de Melo > > Stanislav Fomichev (7): > perf trace: add perf_event parameter to tracepoint_handler > perf trace: add support for pagefault tracing > perf trace: add pagefaults record and replay support > perf trace: add pagefault statistics > perf trace: add possibility to switch off syscall events > perf kvm: move perf_kvm__mmap_read into session utils > perf trace: add events cache > > tools/perf/Documentation/perf-trace.txt | 19 ++ > tools/perf/builtin-kvm.c| 88 +--- > tools/perf/builtin-trace.c | 350 > ++-- > tools/perf/util/session.c | 85 > tools/perf/util/session.h | 5 + > 5 files changed, 357 insertions(+), 190 deletions(-) > > -- > 1.8.3.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 0/7] perf trace pagefaults
This patch series adds support for pagefaults tracing to 'perf trace' command. It seems this feature was planned by Namhyung Kim (http://events.linuxfoundation.org/images/stories/pdf/klf2012_n_kim.pdf page 17/28) but I couldn't find any prior patches/discussion and started from scratch. First three patches add the feature and options to enable faults and disable syscalls. Two last patches add events caching (like it's done in the perf kvm), so that we don't get fault events prior to mmap/comm events (makes sense only for live mode). This is just a proof-of-concept, and I'd like to get some comments about where and what I got wrong and what additional useful information I can expose in the trace. v2: - added more info to the changelogs - reworked options (-f -> -F, --pgfaults -> --pf=[all|min|maj]) - separated tracepoint_handler changes into additional patch - separated record/replay into additional patch - other fixes pointed out by Arnaldo Carvalho de Melo Stanislav Fomichev (7): perf trace: add perf_event parameter to tracepoint_handler perf trace: add support for pagefault tracing perf trace: add pagefaults record and replay support perf trace: add pagefault statistics perf trace: add possibility to switch off syscall events perf kvm: move perf_kvm__mmap_read into session utils perf trace: add events cache tools/perf/Documentation/perf-trace.txt | 19 ++ tools/perf/builtin-kvm.c| 88 +--- tools/perf/builtin-trace.c | 350 ++-- tools/perf/util/session.c | 85 tools/perf/util/session.h | 5 + 5 files changed, 357 insertions(+), 190 deletions(-) -- 1.8.3.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 0/7] perf trace pagefaults
This patch series adds support for pagefaults tracing to 'perf trace' command. It seems this feature was planned by Namhyung Kim (http://events.linuxfoundation.org/images/stories/pdf/klf2012_n_kim.pdf page 17/28) but I couldn't find any prior patches/discussion and started from scratch. First three patches add the feature and options to enable faults and disable syscalls. Two last patches add events caching (like it's done in the perf kvm), so that we don't get fault events prior to mmap/comm events (makes sense only for live mode). This is just a proof-of-concept, and I'd like to get some comments about where and what I got wrong and what additional useful information I can expose in the trace. v2: - added more info to the changelogs - reworked options (-f - -F, --pgfaults - --pf=[all|min|maj]) - separated tracepoint_handler changes into additional patch - separated record/replay into additional patch - other fixes pointed out by Arnaldo Carvalho de Melo Stanislav Fomichev (7): perf trace: add perf_event parameter to tracepoint_handler perf trace: add support for pagefault tracing perf trace: add pagefaults record and replay support perf trace: add pagefault statistics perf trace: add possibility to switch off syscall events perf kvm: move perf_kvm__mmap_read into session utils perf trace: add events cache tools/perf/Documentation/perf-trace.txt | 19 ++ tools/perf/builtin-kvm.c| 88 +--- tools/perf/builtin-trace.c | 350 ++-- tools/perf/util/session.c | 85 tools/perf/util/session.h | 5 + 5 files changed, 357 insertions(+), 190 deletions(-) -- 1.8.3.2 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 0/7] perf trace pagefaults
Em Fri, Jun 20, 2014 at 02:49:42PM +0400, Stanislav Fomichev escreveu: This patch series adds support for pagefaults tracing to 'perf trace' command. It seems this feature was planned by Namhyung Kim (http://events.linuxfoundation.org/images/stories/pdf/klf2012_n_kim.pdf page 17/28) but I couldn't find any prior patches/discussion and started from scratch. Just to clarify here, those slides came from slides I made and in turn the whole idea about pagefaults tracing I got from the trace prototype that Thomas Gleixner implemented in his 'trace' utility, described here: Announcing a new utility: 'trace' http://lwn.net/Articles/415728/ The comments section has lots of interesting ideas, some you may find interesting to implement :-) There is a branch in my tree with the branch tglx did his work on: https://git.kernel.org/cgit/linux/kernel/git/acme/linux.git/log/?h=tmp.perf/trace2 There you can take a look and compare what you're doing to what he did. Now I'll go thru your current patches and will cherry pick whatever I think its OK already, and will try and provide comments for whatever I think needs more work. - Arnaldo First three patches add the feature and options to enable faults and disable syscalls. Two last patches add events caching (like it's done in the perf kvm), so that we don't get fault events prior to mmap/comm events (makes sense only for live mode). This is just a proof-of-concept, and I'd like to get some comments about where and what I got wrong and what additional useful information I can expose in the trace. v2: - added more info to the changelogs - reworked options (-f - -F, --pgfaults - --pf=[all|min|maj]) - separated tracepoint_handler changes into additional patch - separated record/replay into additional patch - other fixes pointed out by Arnaldo Carvalho de Melo Stanislav Fomichev (7): perf trace: add perf_event parameter to tracepoint_handler perf trace: add support for pagefault tracing perf trace: add pagefaults record and replay support perf trace: add pagefault statistics perf trace: add possibility to switch off syscall events perf kvm: move perf_kvm__mmap_read into session utils perf trace: add events cache tools/perf/Documentation/perf-trace.txt | 19 ++ tools/perf/builtin-kvm.c| 88 +--- tools/perf/builtin-trace.c | 350 ++-- tools/perf/util/session.c | 85 tools/perf/util/session.h | 5 + 5 files changed, 357 insertions(+), 190 deletions(-) -- 1.8.3.2 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 0/7] perf trace pagefaults
Just to clarify here, those slides came from slides I made and in turn the whole idea about pagefaults tracing I got from the trace prototype that Thomas Gleixner implemented in his 'trace' utility, described here: Announcing a new utility: 'trace' http://lwn.net/Articles/415728/ The comments section has lots of interesting ideas, some you may find interesting to implement :-) There is a branch in my tree with the branch tglx did his work on: https://git.kernel.org/cgit/linux/kernel/git/acme/linux.git/log/?h=tmp.perf/trace2 Wow, thanks, I tried to search lkml for any presence of patches/discussion about these slides, but couldn't find anything, thanks for pointing it out. I really like 'blocking/preempted' indication and of course I miss pointers decoding. Did anyone really think about decoding pointers and how we can implement it (like dumping them upon entering a syscall and then using inside the perf trace?)? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 0/7] perf trace pagefaults
Em Fri, Jun 20, 2014 at 07:03:18PM +0400, Stanislav Fomichev escreveu: Just to clarify here, those slides came from slides I made and in turn the whole idea about pagefaults tracing I got from the trace prototype that Thomas Gleixner implemented in his 'trace' utility, described here: Announcing a new utility: 'trace' http://lwn.net/Articles/415728/ The comments section has lots of interesting ideas, some you may find interesting to implement :-) There is a branch in my tree with the branch tglx did his work on: https://git.kernel.org/cgit/linux/kernel/git/acme/linux.git/log/?h=tmp.perf/trace2 Wow, thanks, I tried to search lkml for any presence of patches/discussion about these slides, but couldn't find anything, thanks for pointing it out. I really like 'blocking/preempted' indication and of course I miss pointers decoding. Did anyone really think about decoding pointers and how we can implement it (like dumping them upon entering a syscall and then using inside the perf trace?)? Hey, haven't you seen the vfs_getname probe? Idea is to hook on where the relevant copy_from_user is done and insert that into the ring buffer, as we already do for mapping fd - pathname. Right now it is too simple, but I was starting to work (when you jumped right in with your work making me stop and go on testing/reviewing :) ) on making it more generic so that we could defer pretty printing the arguments from sys_enter to sys_exit, when, by then, we would already have an association of a user level pointer in some specific thread to its contents. This will allow us to to resolve the pathname pointer in things like open() (i.e. not just after that, in the fd syscalls (write, etc)) and as well any other pointer of interest. By librarizing 'builtin-probe.c', that now uses lots of global variables, etc, we would be able to insert probes where we want them to capture the contents of pointers, check if the probes are already in place, use just the ones that we managed to insert (i.e. that were not invalid because the places where we wanted them to be were changed across kernel releases, etc). I.e. no need for actual tracepoints from day one, just wannabe tracepoints using whatever probe inserting gizmo the kprobes_tracer used by 'perf probe' now thinks its best to use. Combine that with using DWARF descriptions (that could be pre cached into something like CTF (the DTrace kind of CTF) or similar) like pahole does and we would mostly automatically do all this work of prettyfing syscall parameters. /handwave :-) For now try: perf probe 'vfs_getname=getname_flags:65 pathname=result-name:string' trace And look at how it manages to decode fds. - Arnaldo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 0/7] perf trace pagefaults
Hey, haven't you seen the vfs_getname probe? Idea is to hook on where the relevant copy_from_user is done and insert that into the ring buffer, as we already do for mapping fd - pathname. I saw it but didn't actually try because it needs all the debugging stuff enabled and in place. I.e. no need for actual tracepoints from day one, just wannabe tracepoints using whatever probe inserting gizmo the kprobes_tracer used by 'perf probe' now thinks its best to use. But we then need to predefine many probes for decoding to work in the form of func:offset, and then play catch-up with all the kernel changes. Or I miss something important here? For now try: perf probe 'vfs_getname=getname_flags:65 pathname=result-name:string' trace And look at how it manages to decode fds. I will try, but does 65 still work after c4ad8f98bef77c7356aa6a9ad9188a6acc6b849d? :-) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 0/7] perf trace pagefaults
Em Fri, Jun 20, 2014 at 08:18:59PM +0400, Stanislav Fomichev escreveu: Hey, haven't you seen the vfs_getname probe? Idea is to hook on where the relevant copy_from_user is done and insert that into the ring buffer, as we already do for mapping fd - pathname. I saw it but didn't actually try because it needs all the debugging stuff enabled and in place. Touché, more on that below... I.e. no need for actual tracepoints from day one, just wannabe tracepoints using whatever probe inserting gizmo the kprobes_tracer used by 'perf probe' now thinks its best to use. But we then need to predefine many probes for decoding to work in the form of func:offset, and then play catch-up with all the kernel changes. Or I miss something important here? No you don't. If we want to disturb the system in the least way possible, we need to tag along the copying from userspace of those pointers, so that we get them fresh and just stash it in our ring buffer and get out of the way quickly. For now try: perf probe 'vfs_getname=getname_flags:65 pathname=result-name:string' trace And look at how it manages to decode fds. I will try, but does 65 still work after c4ad8f98bef77c7356aa6a9ad9188a6acc6b849d? :-) Well, when I prototyped this[1] the idea is that in some areas, there is not that much code flux that before commiting to any kind of new interface, be it tracepoints or something else, we may well just use 'perf probe' to get what we need, and this was done in... commit 75b757ca90469e990e6901f4a9497fe4161f7f5a Author: Arnaldo Carvalho de Melo a...@redhat.com Date: Tue Sep 24 11:04:32 2013 -0300 Almost a year ago, and it still works, now lets see the cset you mention... [acme@zoo linux]$ git describe c4ad8f98bef77c7356aa6a9ad9188a6acc6b849d v3.14-rc1-14-gc4ad8f98bef7 [acme@zoo linux]$ [root@zoo ~]# uname -r 3.15.0-rc8+ Humm, what is the problem? [root@zoo ~]# perf probe -V getname_flags:65 Available variables at getname_flags:65 @getname_flags+227 char* filename int len int*empty long intmax struct filename*result [root@zoo ~]# [root@zoo ~]# perf probe 'vfs_getname=getname_flags:65 pathname=result-name:string' Added new event: probe:vfs_getname(on getname_flags:65 with pathname=result-name:string) You can now use it in all perf tools, such as: perf record -e probe:vfs_getname -aR sleep 1 [root@zoo ~]# perf record -e probe:vfs_getname -aR sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 1.133 MB perf.data (~49505 samples) ] [root@zoo ~]# perf evlist probe:vfs_getname [root@zoo ~]# perf evlist -v probe:vfs_getname: sample_freq=1, type: 2, config: 1317, size: 96, sample_type: IP|TID|TIME|CPU|PERIOD|RAW, disabled: 1, inherit: 1, mmap: 1, comm: 1, sample_id_all: 1, exclude_guest: 1 [root@zoo ~]# perf script perf 11255 [003] 156054.623210: probe:vfs_getname: (811c2e43) pathname=/home/acme/libexec/perf-core/sleep perf 11255 [003] 156054.624759: probe:vfs_getname: (811c2e43) pathname=/usr/lib64/qt-3.3/bin/sleep perf 11255 [003] 156054.624782: probe:vfs_getname: (811c2e43) pathname=/usr/lib64/ccache/sleep perf 11255 [003] 156054.624794: probe:vfs_getname: (811c2e43) pathname=/usr/local/sbin/sleep perf 11255 [003] 156054.624809: probe:vfs_getname: (811c2e43) pathname=/usr/local/bin/sleep perf 11255 [003] 156054.624818: probe:vfs_getname: (811c2e43) pathname=/sbin/sleep perf 11255 [003] 156054.625017: probe:vfs_getname: (811c2e43) pathname=/bin/sleep sleep 11255 [002] 156054.626093: probe:vfs_getname: (811c2e43) pathname=/etc/ld.so.preload sleep 11255 [002] 156054.626114: probe:vfs_getname: (811c2e43) pathname=/etc/ld.so.cache sleep 11255 [002] 156054.626159: probe:vfs_getname: (811c2e43) pathname=/lib64/libc.so.6 sleep 11255 [002] 156054.626751: probe:vfs_getname: (811c2e43) pathname=/usr/lib/locale/locale-archive goa-daemon 2082 [003] 156054.955138: probe:vfs_getname: (811c2e43) pathname=/etc/localtime goa-daemon 2082 [003] 156054.955573: probe:vfs_getname: (811c2e43) pathname=/etc/localtime [root@zoo ~]# Best possible way to do this? Guess not, but I'm looking from a tooling perspective, i.e. about using what is available, not about adding requirements to the kernel or toolchain, that we can do after we prototype in the best way possible with existing facilities. - Arnaldo [1] And I feel like all of tools/perf/ is just that, reference implementations, but hopefully done in a such a way that may well be useful as-is :-) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to