Actually, some more testing and I found what I think is either a bug, 
or perhaps a mis-feature. The problem has nothing to do with a race 
between the commit and the discard. It has all to do with commit 
processing in an END clause.

The missing processes always have their missing speculative data 
having been recorded on a cpu that different from the one executing 
the END clauses. When I do a commit of a speculation in an END clause, 
I never, ever, see any of the speculative data that was recorded on 
any other cpu. From this fact and from what I said in my last message 
about when speculative buffers from other cpus get copied into the 
principle buffer, I will speculate (heh) that the root cause is that 
after the END clauses finish, dtrace exits and thus never gives itself 
  a chance to do the copying. According to the docs, it does this 
copying at a rate set by the "cleanrate". My guess is that dtrace 
exits and does no more copying after the last END clause finishes.

I tried putting in one last END clause with a chill in it that was 
longer than the cleanrate, put that didn't help. I suspect that the 
chill happens in the kernel context and the other stuff is in the 
dtrace context, so the chill always executes first, even though it is 
positionally last. In fact, it had the interesting side effect of 
stopping any  data from being printed out at all in my END clauses. I 
suspect that the de-allocation of some the variables had already 
occurred. Boy, the END processing can get interesting.

Brian Utterback wrote:
> Ah, Pramod was right, for at least part of my problem. I just re-read 
> the chapter on speculative tracing, and it says that when a commit is 
> done, the speculative buffer from the cpu that does the commit is 
> immediately copied to the principle buffer, with the speculative 
> buffer from other cpus copied sometime later. This greatly increases 
> the chances of the the discard happening before the buffer is copied, 
> but after the value of ts is examined. It still doesn't explain the 
> missing processes though.
> 
> I worked around this problem by having a global variable on the clause 
> with the discard. In the very first END clause, I change the global so 
> the clause with the discard is thus always false and no more discard 
> occur. This got rid of all the empty output. Are global variables 
> coherent across CPUs?
> 
> Brian Utterback wrote:
>> Actually, I thought that there might be some kind of race condition 
>> like that, but that really can't be it. See the time stamps? 
>> Speculation 5 was set 9 seconds earlier, and speculation 6 was set 1.6 
>>   seconds earlier. The point is that I am seeing a very small number 
>> of syspoll calls and they tend to wait a relatively long time. I could 
>> see getting that kind of race once in awhile at that rate, or even all 
>> of the time if the rate of pollsys calls was very high, but that isn't 
>> the case. Also, there is the one process that I know is waiting in 
>> syspoll that doesn't show up here at all.
>>
>> Pramod Batni wrote:
>>>
>>>>    1      2    :END         5        9163
>>>> ===================================================
>>>>
>>>>
>>>>    1      2    :END         6        1686
>>>> ===================================================
>>>>
>>>>
>>>>
>>>> See the problem? In this case speculations 5 and 6 both have non-zero 
>>>>   ts array entries (or they would not have printed) but there is 
>>>> nothing in the speculation buffer (otherwise there would have been the 
>>>> first set of traces printed). Any ideas why this is happening?
>>>>   
>>> Consider the following 2 probes firing on 2 different cpu's at 
>>> approximately the same time.
>>>> } \
>>>> END \
>>>> /ts[this->spec]/ \
>>>> { \
>>>>          commit(this->spec); \
>>>>          this->spec++; \
>>>> }
>>>>
>>>>
>>>>
>>>> syscall::pollsys:return
>>>> /self->spec/
>>>> {
>>>>          ts[self->spec] = 0;
>>>>          discard(self->spec);
>>>>          self->spec = 0;
>>>> }
>>>>
>>>>   
>>> The interleaved execution of the predicate and the clause could lead to 
>>> the kind of output you are observing.
>>> The first probe would find ts[this->spec] to be non-zero and before it 
>>> can execute 'commit(this->spec)' the second
>>> probe would fire and execute 'discard(self->spec)' . The contents of the 
>>> speculative buffer are discarded and the 'commit(this->spec)'
>>> would become a nop.
>>> Such a sequence could explain the missing data for speculation 5 and 6.
>>>
>>>
>>> Pramod
> 

-- 
blu

There are two rules in life:
Rule 1- Don't tell people everything you know
----------------------------------------------------------------------
Brian Utterback - Solaris RPE, Sun Microsystems, Inc.
Ph:877-259-7345, Em:brian.utterback-at-ess-you-enn-dot-kom
_______________________________________________
dtrace-discuss mailing list
[email protected]

Reply via email to