Is it possible that speculation buffers are per CPU? I ran a program 
that reads data from a socket and mostly juts sits in pollsys. When I 
ran the program I enclosed, it often did not print out anything, even 
though I know it went into pollsys. I even changed the tests to print 
out all of the speculation buffers without regard to the setting of 
the ts variable, and I still did not see anything, but I almost always 
had indices where the ts variable was non-zero but the commit didn't 
print anything. I thought maybe the speculation buffer might need to 
be on the samne CPU or something.

This is really a problem, I have to be sure I print out the hung 
threads. It takes a while for the processes and I can't keep going 
back and saying, "well, that script didn't get it, try this one."


Brian Utterback wrote:
> A while back, I asked how to detect speculations that didn't complete. 
> At that time, I was told that the return value of speculation() is 
> simply an integer in the range from 1 up to the value of nspec, and so 
> you could tap into any speculation by using an integer.
> 
> Here was and is my problem. I have a customer who have a call to the 
> connect syscall that hangs and never returns. This is using UNIX 
> domain sockets, so that really shouldn't happen at all. I am trying to 
> find out what is different about the threads that hang. So, I create a 
> speculation and trace the thread. The only thing is, how do I commit 
> the speculation based on something that doesn't happen?
> 
> The suggestion at the time was to place a timestamp into an array when 
> the call starts, and then use the tick-100hz trick to iterate over the 
> array, looking for timestamps above some threshold in the past. When I 
> found one, use the index as the speculation number and commit it.
> 
> That sounded good, but it didn't work. I thought maybe I was getting 
> too clever with the tick trick and all that, so based on a message 
> earlier this week, I though about simplifying the whole thing using 
> END probes and unrolling the loop. You see, the customer can see the 
> hung processes, so I figured if I committed all of the outstanding 
> speculations when the DTrace script is interrupted, that would be just 
> as good and much simpler, less chance something could go wrong.
> 
> So, just to test it out before giving it to my customer, I wrote a 
> script that uses the pollsys command and traces the kernel threads 
> that are in pollsys at any given moment, and also it prints a stack 
> trace when it goes into the resume call (which is the last kernel 
> function called in pollsys when it blocks).
> 
> I wrote a macro to print out each speculation, along with how many 
> milliseconds it had been in the pollsys call. I stored the timestamp 
> for each speculation for when it was created, and then printed out the 
> difference when the script is interrupted. I used a this variable to 
> iterate through the list of speculations.
> 
> It seems to work, except for some reason I get output for speculations 
> with no speculation buffer. Here is the script:
> 
> #!/usr/sbin/dtrace  -Cs
> 
> #pragma D option nspec=16
> 
> #define printspec       END \
> /ts[this->spec]/ \
> { \
>          speculate(this->spec); \
>          trace(this->spec); \
>          trace((timestamp-ts[this->spec])/1000000); \
>  
> trace("\n===================================================\n\n"); \
> } \
> END \
> /ts[this->spec]/ \
> { \
>          commit(this->spec); \
>          this->spec++; \
> }
> 
> long ts[17];
> 
> syscall::pollsys:entry
> {
>          self->spec = speculation();
>          speculate(self->spec);
>          ts[self->spec] = timestamp;
>          trace(self->spec);
>          trace(tid);
>          trace(pid);
>          trace(execname);
> }
> 
> fbt::resume:entry
> /self->spec/
> {
>          speculate(self->spec);
>          stack();
> }
> 
> syscall::pollsys:return
> /self->spec/
> {
>          ts[self->spec] = 0;
>          discard(self->spec);
>          self->spec = 0;
> }
> 
> END{
>          trace("end");
>          trace(timestamp);
>          this->spec = 1;
> }
> 
> printspec
> printspec
> printspec
> printspec
> printspec
> printspec
> printspec
> printspec
> printspec
> printspec
> printspec
> printspec
> printspec
> printspec
> printspec
> printspec
> 
> 
> And here is the output after 15 seconds:
> 
> dtrace: script './test1.d' matched 36 probes
> ^C
> CPU     ID    FUNCTION:NAME
>    1      2    :END   end      2071629828966800
>    1   7359    pollsys:entry    1   1 19878  snmpd
>    1  11509    resume:entry
>                genunix`cv_timedwait_sig+0x160
>                genunix`cv_waituntil_sig+0x84
>                genunix`poll_common+0x498
>                genunix`pollsys+0xcc
>                genunix`dtrace_systrace_syscall+0xa8
>                unix`syscall_trap+0xac
> 
>    1      2    :END         1         675
> ===================================================
> 
> 
>    1   7359    pollsys:entry    2   1 550  sendmail
>    1  11509    resume:entry
>                genunix`cv_timedwait_sig+0x160
>                genunix`cv_waituntil_sig+0x84
>                genunix`poll_common+0x498
>                genunix`pollsys+0xcc
>                genunix`dtrace_systrace_syscall32+0xa8
>                unix`syscall_trap32+0xcc
> 
>    1      2    :END         2        4748
> ===================================================
> 
> 
>    1   7359    pollsys:entry    3   1  456  dhcpagent
>    1  11509    resume:entry
>                genunix`cv_timedwait_sig+0x160
>                genunix`cv_waituntil_sig+0x84
>                genunix`poll_common+0x498
>                genunix`pollsys+0xcc
>                genunix`dtrace_systrace_syscall32+0xa8
>                unix`syscall_trap32+0xcc
> 
>    1      2    :END         3         818
> ===================================================
> 
> 
>    1   7359    pollsys:entry    4   12   911  java
>    1  11509    resume:entry
>                genunix`cv_timedwait_sig+0x160
>                genunix`cv_waituntil_sig+0x84
>                genunix`poll_common+0x13c
>                genunix`pollsys+0xcc
>                genunix`dtrace_systrace_syscall32+0xa8
>                unix`syscall_trap32+0xcc
> 
>    1      2    :END         4          29
> ===================================================
> 
> 
>    1   7359    pollsys:entry    5   14   911  java
>    1  11509    resume:entry
>                genunix`cv_timedwait_sig+0x160
>                genunix`cv_waituntil_sig+0x84
>                genunix`poll_common+0x13c
>                genunix`pollsys+0xcc
>                genunix`dtrace_systrace_syscall32+0xa8
>                unix`syscall_trap32+0xcc
> 
>    1      2    :END         5        9163
> ===================================================
> 
> 
>    1      2    :END         6        1686
> ===================================================
> 
> 
> 
> See the problem? In this case speculations 5 and 6 both have non-zero 
>   ts array entries (or they would not have printed) but there is 
> nothing in the speculation buffer (otherwise there would have been the 
> first set of traces printed). Any ideas why this is happening?

-- 
blu

There are two rules in life:
Rule 1- Don't tell people everything you know
----------------------------------------------------------------------
Brian Utterback - Solaris RPE, Sun Microsystems, Inc.
Ph:877-259-7345, Em:brian.utterback-at-ess-you-enn-dot-kom
_______________________________________________
dtrace-discuss mailing list
[email protected]

Reply via email to