Adam,

That's an interesting analysis. Hope we can have a fix soon.

It also makes me wonder: how many races of that kind are there? And is
it possible to have a comprehensive fix or approach to handle stuff like
this?

Thanks,

Oleksandr Lytvyn
Morgan Stanley | Technology
210 Carnegie Center, 4th Floor | Princeton, NJ  08540
Phone: +1 609 936-4026
Mobile: +1 732 773-4145
[EMAIL PROTECTED]
 

> -----Original Message-----
> From: Adam Leventhal [mailto:[EMAIL PROTECTED] 
> Sent: Monday, April 21, 2008 2:00 PM
> To: Lytvyn, Oleksandr (IT)
> Cc: dtrace-discuss@opensolaris.org
> Subject: Re: [dtrace-discuss] whatfor.d -- where's null pointer?
> 
> Hi Oleksandr,
> 
> This turned out to be a rather interesting problem. To 
> investigate, I used
> this script:
> 
> ---8<---
> off-cpu
> {
>       this->tmp = curlwpsinfo->pr_stype;
>       this->tmp = curlwpsinfo->pr_stype;
>       this->tmp = curlwpsinfo->pr_stype;
>       this->tmp = curlwpsinfo->pr_stype;
>       this->tmp = curlwpsinfo->pr_stype;
>       this->tmp = curlwpsinfo->pr_stype;
>       this->tmp = curlwpsinfo->pr_stype;
>       this->tmp = curlwpsinfo->pr_stype;
>       this->tmp = curlwpsinfo->pr_stype;
>       this->tmp = curlwpsinfo->pr_stype;
> }
> 
> ERROR
> {
>       @[arg2] = count();
> }
> ---8<---
> 
> Which resulted in a table like this:
> 
>                 9                1
>                10                1
>                 5                2
>                 7                2
>                 1                3
>                 3                3
>                 2                5
>                 6                5
>                 4               10
> 
> So curlwpsinfo->pr_stype can work and later fail. Looking at 
> the translator
> for that field we see that it looks like this:
> 
>         pr_stype = T->t_sobj_ops ? T->t_sobj_ops->sobj_type : 0;
> 
> This compiles to this DIF code:
> 
> OFF OPCODE      INSTRUCTION
> 00: 29010001    ldgs DT_VAR(256), %r1           ! DT_VAR(256) 
> = "curthread"
> 01: 25000002    setx DT_INTEGER[0], %r2         ! 0x0
> 02: 04010201    sll  %r1, %r2, %r1
> 03: 05010201    srl  %r1, %r2, %r1
> 04: 0e010002    mov  %r1, %r2
> 05: 25000103    setx DT_INTEGER[1], %r3         ! 0x88
> 06: 07020302    add  %r2, %r3, %r2
> 07: 22020002    ldx  [%r2], %r2
> 08: 10020000    tst  %r2
> 09: 12000011    be   17
> 10: 0e010002    mov  %r1, %r2
> 11: 25000103    setx DT_INTEGER[1], %r3         ! 0x88
> 12: 07020302    add  %r2, %r3, %r2
> 13: 22020002    ldx  [%r2], %r2
> 14: 1e020002    ldsw [%r2], %r2
> 15: 0e020002    mov  %r2, %r2
> 16: 11000012    ba   18
> 17: 25000002    setx DT_INTEGER[0], %r2         ! 0x0
> 18: 25000203    setx DT_INTEGER[2], %r3         ! 0x38
> 19: 04020302    sll  %r2, %r3, %r2
> 20: 2e020302    sra  %r2, %r3, %r2
> 21: 23000002    ret  %r2
> 
> We can see that we load the t_sobj_ops member once at offset 
> 07 and then again
> at offset 17 (right before we load sobj_type at offset 18). 
> The t_sobj_ops
> member can be set to NULL asynchronously from other threads 
> so this double
> load introduces a window for the failure that you're seeing.
> 
> Either we need to use some temporary, probe-local variable 
> (one that can't
> conflict with a user-defined variable), or we need to perform 
> some element of
> optimization to the generated DIF.
> 
> I've filed this bug:
> 
>   6691541 curlwpsinfo->pr_stype races
> 
> Adam
> 
> On Fri, Apr 18, 2008 at 04:20:31PM -0400, Lytvyn, Oleksandr 
> (IT) wrote:
> > Hi!
> >  
> > Anyone seen this? This one buffles me: I run whatfor.d from
> > /usr/demo/dtrace, and here's what I get:
> >  
> > dtrace: script '/usr/demo/dtrace/whatfor.d' matched 12 probes
> > dtrace: error on enabled probe ID 1 (ID 681: 
> sched:unix:resume:off-cpu):
> > invalid address (0x0) in action #1 at DIF offset 56
> > dtrace: error on enabled probe ID 1 (ID 681: 
> sched:unix:resume:off-cpu):
> > invalid address (0x0) in action #1 at DIF offset 56
> > dtrace: error on enabled probe ID 1 (ID 681: 
> sched:unix:resume:off-cpu):
> > invalid address (0x0) in action #1 at DIF offset 56
> > dtrace: error on enabled probe ID 1 (ID 681: 
> sched:unix:resume:off-cpu):
> > invalid address (0x0) in action #1 at DIF offset 56
> > ...
> >  
> > Multitudes of those. Apparently, action #1 of probe ID 1 is:
> >  
> > self->sobj = curlwpsinfo->pr_stype;
> >  
> > So, which address is invalid here? The curlwpsinfo is used 
> in predicate,
> > so it cannot be 0x0, because it'd complain about the 
> predicate too. And
> > pr_stype is supposed to be char. 
> >  
> > What's wrong here?
> >  
> > Found another error report like that on the web, BTW (in German,
> > accidentially). But no responses there, unfortunately.
> >  
> > Thanks,
> >  
> > Oleksandr Lytvyn
> > Morgan Stanley | Technology
> > 210 Carnegie Center, 4th Floor | Princeton, NJ  08540
> > Phone: +1 609 936-4026
> > Mobile: +1 732 773-4145
> > [EMAIL PROTECTED]
> > --------------------------------------------------------
> > 
> > NOTICE: If received in error, please destroy and notify 
> sender. Sender does not intend to waive confidentiality or 
> privilege. Use of this email is prohibited when received in error.
> 
> > _______________________________________________
> > dtrace-discuss mailing list
> > dtrace-discuss@opensolaris.org
> 
> 
> -- 
> Adam Leventhal, Fishworks                        
> http://blogs.sun.com/ahl
>
--------------------------------------------------------

NOTICE: If received in error, please destroy and notify sender. Sender does not 
intend to waive confidentiality or privilege. Use of this email is prohibited 
when received in error.
_______________________________________________
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org

Reply via email to