On 04/14/2010 05:26 AM, Sylvain Geneves wrote:

Here's what i see in the resulting trace directory:

http://pastebin.com/raw.php?i=TqJWarkA

it seems that all traces aren't recorded (some metadata and ust are
zero
size), i can't understand why, i must be missing something here... ?

-f should fix this.


also, when using lttv, it says "Cannot open trace : maybe you should
enter in the trace directory to select it ?"
i note that lttv can open a subdirectory (like
/root/.usttraces/californium-20100329172748911948191/8494_5454077942969735172


in my example), but obviously all i can see is a subset of what really
happenned...

I just looked at this more closely.

thanks for your help


If your program does not fork (nor clone without the CLONE_VM flag),
it's still possible you are tracing several processes. This would be
caused by the fact that the command you are passing to usttrace is not
an elf executable but rather a shell script that does many things like
run commands, which starts new processes. For example, a shell script
that uses find, sed, ls etc. will start them in their distinct process.
The tracer will trace everything. To find out what these processes are,
you could run the same command with strace -f and see what is going on.


the command i launch is actually an executable, not a script. What it
does is quite simple :
- the application registers some events to be processed
- it creates one thread per core on the machine
- each thread executes a loop to execute events
after all events has been processed, each created thread calls
pthread_exit, and the application exits normally.

Interesting, I have a very similar program I use for benchmarking ust and it works well.


the output of strace -f shows that there is no call to fork in there,
but clone is called multiple times.

that leads me to a question about how to correctly enable the CLONE_VM
flag for ust : is enabling it through the shell environment (export
CLONE_VM=1) sufficient ?
is there another way to do it ?

CLONE_VM is a flag you pass to the clone() call. If you don't pass it, clone() acts more like fork() in the sense it creates a new address space for the new task. If you do pass it, the same address space is reused, effectively creating a thread. If you are using pthreads, then pthread is passing CLONE_VM, you can actually see it with strace.

The reason I was suggesting you use strace is to find out what the tons of processes we are seeing are. Are you using the -l option to usttrace? This could be another cause. If so, try to stop using it. It's going to simplify things.



The fact that some of the files are empty means that the daemon was
unable to connect to the processes at some point. Indeed, you are
getting printed errors about this. But it is initially able to because
we see there is at least one file that is not empty. This is unlikely
due to the fact the process stops existing too quickly. This could
happen because of an exec() or an exit(), but there is a keep alive
mechanism (triggered by ustctl -f in the case of the exec) that induces
a delay in these calls when the ustd did not have the time to connect to
all buffers yet.

Either there is a bug in this mechanism, or something crashes. Could be
a bug in your program or in ust. When the traced program crashes, only
the buffers that were already connected can be recovered. Enabling core
dumps (ulimit -c) and checking if there are core files in the directory
after execution could help finding if you have segfaults or similar
crashes.

That's a good point.
nothing crashes explicitely (to be sure i ran my program after "ulimit
-c unlimited" and they were no resulting core dump).


Did you have a look at ustd.log and app.log in the trace directory? They
could give additional information.

i did not looked at it enough, since there is some interesting lines
near the end of the files :

in ustd.log, when grepping with Error i get multiple lines like this :


ustd[17643/17973]: Error: unable to parse response to get_pidunique (in
connect_buffer() at ustd.c:268)
ustd[17643/17973]: Error: failed to connect to buffer (in
consumer_thread() at ustd.c:581)
libustcomm[17643/17995]: Error: connect (path=/tmp/ust-app-socks/17992):
Connection refused (in ustcomm_connect_path() at ustcomm.c:581)

The "unable to parse response to get_pidunique" is not supposed to happen. This would certainly result in empty files. Perhaps add some code to print the string that is received (near ustd.c:268). It would help debug.



when grepping with die there are a lot of :

ustd[17643/17710]: application died while putting subbuffer (in
consumer_loop() at ustd.c:513)
ustd[17643/17687]: app died while being traced (in get_subbuffer() at
ustd.c:75)
ustd[17643/17745]: For buffer metadata_2, the trace was not found. This
llibustcomm[17643/17747]: sent message "put_subbuffer ust_0 0" (in
send_messagustd[17643/17745]: application died while putting subbuffer
(in consumer_loop() at ustd.c:513)
ustd[17643/17995]: Warning: unable to connect to process, it probably
died before we were able to connect (in connect_buffer() at ustd.c:250)


in app.log, i can see some lines like this :

libust[17924/17928]: Cannot find trace. It was likely destroyed by the
user. (in do_cmd_put_subbuffer() at tracectl.c:717)



sorry not to have seen this before...
from what i understand, it seems ust is seeing my application dying
unexpectedly. It seems weird to me because the application appears to be
exiting gracefully (no segfault, nor else). What could i do to check
that more thoroughly ?
Are there some ust calls to add in my application to notify ust we will
stop tracing, or the thread will exit ?


Most of these messages you pasted are actually not problems. Anything that does not start with "Error" or "Warning" is being displayed only because you are in debug mode and is being handled gracefully. Yes, ust finds the program dies "unexpectedly" because the trace is still running. There is an ltt_trace_stop() function you could call to stop the trace, but I do not recommend it. When the daemon detects the process ended, the buffer recovery kicks in and that's ok. Anyway, the problem seems not to be there but rather at buffer connection time.

Concerning the "Warning: unable to connect to process, it probably died before we were able to connect", that is really problematic because it means the buffers files will be empty. To understand what is going on there it would probably help to have the full log files.

Are you able to trace a small program with just one marker like the example in the manual?

Did you update to the latest git to get the -fno-strict-aliasing?

pmf

_______________________________________________
ltt-dev mailing list
[email protected]
http://lists.casi.polymtl.ca/cgi-bin/mailman/listinfo/ltt-dev

Reply via email to