A while back, I asked how to detect speculations that didn't complete. 
At that time, I was told that the return value of speculation() is 
simply an integer in the range from 1 up to the value of nspec, and so 
you could tap into any speculation by using an integer.

Here was and is my problem. I have a customer who have a call to the 
connect syscall that hangs and never returns. This is using UNIX 
domain sockets, so that really shouldn't happen at all. I am trying to 
find out what is different about the threads that hang. So, I create a 
speculation and trace the thread. The only thing is, how do I commit 
the speculation based on something that doesn't happen?

The suggestion at the time was to place a timestamp into an array when 
the call starts, and then use the tick-100hz trick to iterate over the 
array, looking for timestamps above some threshold in the past. When I 
found one, use the index as the speculation number and commit it.

That sounded good, but it didn't work. I thought maybe I was getting 
too clever with the tick trick and all that, so based on a message 
earlier this week, I though about simplifying the whole thing using 
END probes and unrolling the loop. You see, the customer can see the 
hung processes, so I figured if I committed all of the outstanding 
speculations when the DTrace script is interrupted, that would be just 
as good and much simpler, less chance something could go wrong.

So, just to test it out before giving it to my customer, I wrote a 
script that uses the pollsys command and traces the kernel threads 
that are in pollsys at any given moment, and also it prints a stack 
trace when it goes into the resume call (which is the last kernel 
function called in pollsys when it blocks).

I wrote a macro to print out each speculation, along with how many 
milliseconds it had been in the pollsys call. I stored the timestamp 
for each speculation for when it was created, and then printed out the 
difference when the script is interrupted. I used a this variable to 
iterate through the list of speculations.

It seems to work, except for some reason I get output for speculations 
with no speculation buffer. Here is the script:

#!/usr/sbin/dtrace  -Cs

#pragma D option nspec=16

#define printspec       END \
/ts[this->spec]/ \
{ \
         speculate(this->spec); \
         trace(this->spec); \
         trace((timestamp-ts[this->spec])/1000000); \
 
trace("\n===================================================\n\n"); \
} \
END \
/ts[this->spec]/ \
{ \
         commit(this->spec); \
         this->spec++; \
}

long ts[17];

syscall::pollsys:entry
{
         self->spec = speculation();
         speculate(self->spec);
         ts[self->spec] = timestamp;
         trace(self->spec);
         trace(tid);
         trace(pid);
         trace(execname);
}

fbt::resume:entry
/self->spec/
{
         speculate(self->spec);
         stack();
}

syscall::pollsys:return
/self->spec/
{
         ts[self->spec] = 0;
         discard(self->spec);
         self->spec = 0;
}

END{
         trace("end");
         trace(timestamp);
         this->spec = 1;
}

printspec
printspec
printspec
printspec
printspec
printspec
printspec
printspec
printspec
printspec
printspec
printspec
printspec
printspec
printspec
printspec


And here is the output after 15 seconds:

dtrace: script './test1.d' matched 36 probes
^C
CPU     ID    FUNCTION:NAME
   1      2    :END   end      2071629828966800
   1   7359    pollsys:entry    1   1 19878  snmpd
   1  11509    resume:entry
               genunix`cv_timedwait_sig+0x160
               genunix`cv_waituntil_sig+0x84
               genunix`poll_common+0x498
               genunix`pollsys+0xcc
               genunix`dtrace_systrace_syscall+0xa8
               unix`syscall_trap+0xac

   1      2    :END         1         675
===================================================


   1   7359    pollsys:entry    2   1 550  sendmail
   1  11509    resume:entry
               genunix`cv_timedwait_sig+0x160
               genunix`cv_waituntil_sig+0x84
               genunix`poll_common+0x498
               genunix`pollsys+0xcc
               genunix`dtrace_systrace_syscall32+0xa8
               unix`syscall_trap32+0xcc

   1      2    :END         2        4748
===================================================


   1   7359    pollsys:entry    3   1  456  dhcpagent
   1  11509    resume:entry
               genunix`cv_timedwait_sig+0x160
               genunix`cv_waituntil_sig+0x84
               genunix`poll_common+0x498
               genunix`pollsys+0xcc
               genunix`dtrace_systrace_syscall32+0xa8
               unix`syscall_trap32+0xcc

   1      2    :END         3         818
===================================================


   1   7359    pollsys:entry    4   12   911  java
   1  11509    resume:entry
               genunix`cv_timedwait_sig+0x160
               genunix`cv_waituntil_sig+0x84
               genunix`poll_common+0x13c
               genunix`pollsys+0xcc
               genunix`dtrace_systrace_syscall32+0xa8
               unix`syscall_trap32+0xcc

   1      2    :END         4          29
===================================================


   1   7359    pollsys:entry    5   14   911  java
   1  11509    resume:entry
               genunix`cv_timedwait_sig+0x160
               genunix`cv_waituntil_sig+0x84
               genunix`poll_common+0x13c
               genunix`pollsys+0xcc
               genunix`dtrace_systrace_syscall32+0xa8
               unix`syscall_trap32+0xcc

   1      2    :END         5        9163
===================================================


   1      2    :END         6        1686
===================================================



See the problem? In this case speculations 5 and 6 both have non-zero 
  ts array entries (or they would not have printed) but there is 
nothing in the speculation buffer (otherwise there would have been the 
first set of traces printed). Any ideas why this is happening?
-- 
blu

There are two rules in life:
Rule 1- Don't tell people everything you know
----------------------------------------------------------------------
Brian Utterback - Solaris RPE, Sun Microsystems, Inc.
Ph:877-259-7345, Em:brian.utterback-at-ess-you-enn-dot-kom
_______________________________________________
dtrace-discuss mailing list
[email protected]

Reply via email to