On 04/14/2010 06:02 PM, Pierre-Marc Fournier wrote:
On 04/14/2010 05:26 AM, Sylvain Geneves wrote:
Here's what i see in the resulting trace directory:
http://pastebin.com/raw.php?i=TqJWarkA
it seems that all traces aren't recorded (some metadata and ust are
zero
size), i can't understand why, i must be missing something here... ?
-f should fix this.
also, when using lttv, it says "Cannot open trace : maybe you should
enter in the trace directory to select it ?"
i note that lttv can open a subdirectory (like
/root/.usttraces/californium-20100329172748911948191/8494_5454077942969735172
in my example), but obviously all i can see is a subset of what
really
happenned...
I just looked at this more closely.
thanks for your help
If your program does not fork (nor clone without the CLONE_VM flag),
it's still possible you are tracing several processes. This would be
caused by the fact that the command you are passing to usttrace is not
an elf executable but rather a shell script that does many things like
run commands, which starts new processes. For example, a shell script
that uses find, sed, ls etc. will start them in their distinct process.
The tracer will trace everything. To find out what these processes are,
you could run the same command with strace -f and see what is going on.
the command i launch is actually an executable, not a script. What it
does is quite simple :
- the application registers some events to be processed
- it creates one thread per core on the machine
- each thread executes a loop to execute events
after all events has been processed, each created thread calls
pthread_exit, and the application exits normally.
Interesting, I have a very similar program I use for benchmarking ust
and it works well.
the output of strace -f shows that there is no call to fork in there,
but clone is called multiple times.
that leads me to a question about how to correctly enable the CLONE_VM
flag for ust : is enabling it through the shell environment (export
CLONE_VM=1) sufficient ?
is there another way to do it ?
CLONE_VM is a flag you pass to the clone() call. If you don't pass it,
clone() acts more like fork() in the sense it creates a new address
space for the new task. If you do pass it, the same address space is
reused, effectively creating a thread. If you are using pthreads, then
pthread is passing CLONE_VM, you can actually see it with strace.
thanks for your explanation, i only use pthread_create, and don't call
clone directly so it shouldn't be a problem.
The reason I was suggesting you use strace is to find out what the tons
of processes we are seeing are. Are you using the -l option to usttrace?
This could be another cause. If so, try to stop using it. It's going to
simplify things.
indeed i was using -l flag with usttrace, removing it simplifies a lot
the output!
here's what i get when i list the trace directory without the -l option:
$ls ~/.usttraces/californium-20100415164805348661147/
14989_5460376113077649648 app.log ustd.log
the only error message i see now is the following :
sh: symbol lookup error: /usr/local/lib/libustfork.so: undefined symbol:
ust_before_fork
the simlink is valid and points to the librairy.
$readelf -a /usr/local/lib/libustfork.so.0.0.0 | grep ust_before_fork
00001ba4 00000607 R_386_JUMP_SLOT 00000000 ust_before_fork
6: 00000000 0 NOTYPE GLOBAL DEFAULT UND ust_before_fork
64: 00000000 0 NOTYPE GLOBAL DEFAULT UND ust_before_fork
it seems that the function is declared somewhere, but not defined ?
The fact that some of the files are empty means that the daemon was
unable to connect to the processes at some point. Indeed, you are
getting printed errors about this. But it is initially able to because
we see there is at least one file that is not empty. This is unlikely
due to the fact the process stops existing too quickly. This could
happen because of an exec() or an exit(), but there is a keep alive
mechanism (triggered by ustctl -f in the case of the exec) that induces
a delay in these calls when the ustd did not have the time to connect to
all buffers yet.
Either there is a bug in this mechanism, or something crashes. Could be
a bug in your program or in ust. When the traced program crashes, only
the buffers that were already connected can be recovered. Enabling core
dumps (ulimit -c) and checking if there are core files in the directory
after execution could help finding if you have segfaults or similar
crashes.
That's a good point.
nothing crashes explicitely (to be sure i ran my program after "ulimit
-c unlimited" and they were no resulting core dump).
Did you have a look at ustd.log and app.log in the trace directory? They
could give additional information.
i did not looked at it enough, since there is some interesting lines
near the end of the files :
in ustd.log, when grepping with Error i get multiple lines like this :
ustd[17643/17973]: Error: unable to parse response to get_pidunique (in
connect_buffer() at ustd.c:268)
ustd[17643/17973]: Error: failed to connect to buffer (in
consumer_thread() at ustd.c:581)
libustcomm[17643/17995]: Error: connect (path=/tmp/ust-app-socks/17992):
Connection refused (in ustcomm_connect_path() at ustcomm.c:581)
The "unable to parse response to get_pidunique" is not supposed to
happen. This would certainly result in empty files. Perhaps add some
code to print the string that is received (near ustd.c:268). It would
help debug.
apparently, i can't reproduce it...here are the only errors left when
using the -l flag with the latest version of ust :
libustcomm[19070/19396]: Error: connect (path=/tmp/ust-app-socks/19391):
Connection refused (in ustcomm_connect_path() at ustcomm.c:581)
ustd[19070/19398]: Error: failed to connect to buffer (in
consumer_thread() at ustd.c:582)
ustd[19070/19396]: Error: failed to connect to buffer (in
consumer_thread() at ustd.c:582)
ustd[19070/19397]: Error: problem in ustcomm_send_request(get_shmid) (in
connect_buffer() at ustd.c:280)
ustd[19070/19397]: Error: failed to connect to buffer (in
consumer_thread() at ustd.c:582)
when grepping with die there are a lot of :
ustd[17643/17710]: application died while putting subbuffer (in
consumer_loop() at ustd.c:513)
ustd[17643/17687]: app died while being traced (in get_subbuffer() at
ustd.c:75)
ustd[17643/17745]: For buffer metadata_2, the trace was not found. This
llibustcomm[17643/17747]: sent message "put_subbuffer ust_0 0" (in
send_messagustd[17643/17745]: application died while putting subbuffer
(in consumer_loop() at ustd.c:513)
ustd[17643/17995]: Warning: unable to connect to process, it probably
died before we were able to connect (in connect_buffer() at ustd.c:250)
in app.log, i can see some lines like this :
libust[17924/17928]: Cannot find trace. It was likely destroyed by the
user. (in do_cmd_put_subbuffer() at tracectl.c:717)
sorry not to have seen this before...
from what i understand, it seems ust is seeing my application dying
unexpectedly. It seems weird to me because the application appears to be
exiting gracefully (no segfault, nor else). What could i do to check
that more thoroughly ?
Are there some ust calls to add in my application to notify ust we will
stop tracing, or the thread will exit ?
Most of these messages you pasted are actually not problems. Anything
that does not start with "Error" or "Warning" is being displayed only
because you are in debug mode and is being handled gracefully. Yes, ust
finds the program dies "unexpectedly" because the trace is still
running. There is an ltt_trace_stop() function you could call to stop
the trace, but I do not recommend it. When the daemon detects the
process ended, the buffer recovery kicks in and that's ok. Anyway, the
problem seems not to be there but rather at buffer connection time.
Concerning the "Warning: unable to connect to process, it probably died
before we were able to connect", that is really problematic because it
means the buffers files will be empty. To understand what is going on
there it would probably help to have the full log files.
Are you able to trace a small program with just one marker like the
example in the manual?
i should have i tried that in the first place, i will do that as soon as
i can
Did you update to the latest git to get the -fno-strict-aliasing?
i'm sorry i didn't have time to do this until now, everything compiles
fine now :)
thanks,
Sylvain
pmf
_______________________________________________
ltt-dev mailing list
[email protected]
http://lists.casi.polymtl.ca/cgi-bin/mailman/listinfo/ltt-dev
_______________________________________________
ltt-dev mailing list
[email protected]
http://lists.casi.polymtl.ca/cgi-bin/mailman/listinfo/ltt-dev