Em Wed, Jul 15, 2015 at 05:41:53PM +0200, Milian Wolff escreveu:
> On Wednesday 15 July 2015 11:53:09 Arnaldo Carvalho de Melo wrote:
> > Executive summary:
> > 
> > . Check the latest sources, i.e. 4.2-rc
> 
> I could also try to compile perf from git, if you tell me what branch and 
> remote I should use.

The branch I use for new features, for the next merge window (the kernel
one, that we tag along) is here:

git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git  perf/core

> > . Some of the stuff you report was implemented or fixed already

> > . Details about some of the limitations and advantages of 'perf trace' below

> > . Including an auto-tracing that will fix one of the problems you reported
> 
> Nice, thanks for that!
> 
> <snip>
> 
> > [acme@zoo ~]$ trace -e file cat file
> > Error:      Invalid syscall file
> > Hint:       try 'perf list syscalls:sys_enter_*'
> > Hint:       and: 'man syscalls'
 
> Just to be clear: Supporting the strace groups as a collection of syscalls 
> will be implemented? Or is that not possible, as it goes straight down to the 
> perf event subsystem which does not know this group?

Right, I'll implement a strlist__new_with_expansions(), to break a CSV
list and look for some specific keys, that when found will expand into
other keys, like what is done for strace -e file, i.e. file becomes
open,write,close, making sure there are no duplicates.

Then it will be just a matter of providing the expansors for those
strace groups.

> <snip>
> 
> > Ok, that was good, using trace to trace trace and fix it, after lunch I'll
> > give it a try.
> 
> Cool, so you'll add that internally such that the hoops are not necessary on 
> the user side to get the filename? Excellent! I also notice that 

Yes, one other plan is to get bits and pieces of 'perf probe' to be more usable
by other tools, so that we can add these probes, but this is fragile, being
useful just for some areas that are not that much in flux, this vfs_getname
one, for isntance has changed multiple times, i.e. the place where we have to
insert the probe and also the variable name we need to get (the one where the
copy from userspace was already done, etc).

> close/read/write and others /sometimes/ write out the filename:
> 
>     10.404 ( 0.001 ms): kmimetypefinde/8217 read(fd: 6</etc/localtime>, buf: 
> 0x7ffe2b6d1319, count: 15671         ) = 0
> 
>     10.529 ( 0.004 ms): kmimetypefinde/8494 read(fd: 6, buf: 0x7ffc61c45b40, 
> count: 8192                          ) = 4
> 
> This is also pretty odd.

Well, when it intercepts the vfs_getname it stores the filename in a per
thread storage, then, when it gets an open syscall return, the fd, it
associates, in an array, per thread, the filename to the fd, and then,
when that fd appears again in another syscall, it expands it, showing
the name.

When vfs_getname is not used, it still works, to some degree, but by generating
some noise in the system by reading the /proc/PID/fd/ symlinks, which is racy,
etc.

Which reminds me what needs to be done for open: to defer printing the
syscall args till we get the vfs_getname tracepoint, when we realize
what is the filename, matching the pointer to the contents... But that
has some complications, as when open itself is interrupted by some other
event... I.e. we need to somehow indicate that, for instance:

 0.605 (0.002 ms): cat/13780 open(filename: 0x78a220, mode: 7905616             
                   ) ...
 0.605 (        ): probe:vfs_getname:(ffffffff812230bc) 
pathname="/usr/share/locale/en.utf8/LC_MESSAGES/libc.mo")
 0.608 (0.004 ms): cat/13780  ... [continued]: open()) = -1 ENOENT No such file 
or directory

I.e. normally we print the first line with what is in the
raw_syscall:sys_enter, and that is just the filename pointer, as you
see above, but then, we will do the association of that pointer to
its contents ("/usr/share/locale/en.utf8/LC_MESSAGES/libc.mo") when
we get that probe:vfs_getname...

 0.605 (0.002 ms): cat/13780 open(filename: 0x78a220, mode: 7905616             
                   ) ...
 0.605 (        ): probe:vfs_getname:(ffffffff812230bc) 
pathname="/usr/share/locale/en.utf8/LC_MESSAGES/libc.mo")
 0.608 (0.004 ms): cat/13780  ... [continued]: open()) = -1 ENOENT No such file 
or directory

We could transform that into:

 0.605 (0.002 ms): cat/13780 open( ... interrupted
 0.605 (        ): probe:vfs_getname:(ffffffff812230bc) 
pathname="/usr/share/locale/en.utf8/LC_MESSAGES/libc.mo")
 0.608 (0.004 ms): cat/13780  ... [continued]: open(filename = 
"/usr/share/locale/en.utf8/LC_MESSAGES/libc.mo", mode: 7905616)) = -1 ENOENT No 
such file or directory

That, when no interruption happens, i.e. no event from syscall enter to syscall
exit happens, i.e. when "probe:vfs_getname" isn't one of the events select, for
instance,  would result into:

 0.605 (0.002 ms): cat/13780 open(filename = 
"/usr/share/locale/en.utf8/LC_MESSAGES/libc.mo", mode: 7905616)) = -1 ENOENT No 
such file or directory

Now we need just to write the code for that ;-)

> <snip>
> 
> > > O_RDONLY flag is not properly handled (compare to the strace output
> > > above).
> > 
> > It is zero, right? yeap:
> > 
> > #define O_RDONLY        00000000
> > 
> > syscall arguments with a zero payload are removed trying to make the output
> > more compact, and also for cases where some of the args are optional, to
> > reduce the logic needed to know when things are optional, just don't print
> > zeroes.
> > 
> > This shortcut has issues, as in this case, where an exception seems to be
> > needed, i.e. we should print O_RDONLY, d'accord?
> 
> Yes, that would be good to have here, I guess. In general, the closer the 
> output gets to `strace`, the better.

Ok.
 
- Arnaldo
--
To unsubscribe from this list: send the line "unsubscribe linux-perf-users" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to