Re: libgomp and OMP_NUM_THREADS

Jakub Jelinek Fri, 20 Jan 2006 00:43:03 -0800

On Thu, Jan 19, 2006 at 08:49:34PM -0500, David Edelsohn wrote:
> >>>>> Jakub Jelinek writes:
> 
> Jakub> * config/rs6000/rs6000.md (UNSPEC_LWSYNC, UNSPEC_ISYNC,
> Jakub> UNSPEC_SYNC_OP, UNSPEC_ATOMIC, UNSPEC_CMPXCHG, UNSPEC_XCHG): Rename
> Jakub> to...
> Jakub> (UNSPECV_LWSYNC, UNSPECV_ISYNC, UNSPECV_SYNC_OP, UNSPECV_ATOMIC,
> Jakub> UNSPECV_CMPXCHG, UNSPECV_XCHG): ... these.
> 
>       I thought Geoff and Richard agreed that volatile was not
> necessary.


I'm open to suggestions on how to fix it differently.
UNSPECV_* certainly works for me.
I'm pasting here parts of my earlier mails:

The problem seems to be miscompilation of libgomp/config/linux/bar.c
on ppc64, particularly that first scheduling swaps the two lines:

      unsigned int generation = bar->generation;

      gomp_mutex_unlock (&bar->mutex);

In life2 dump we have:

(insn 38 37 39 2 ../../../libgomp/config/linux/bar.c:47 (set (reg:SI 131 [ 
<variable>.generation ])
        (mem/s:SI (plus:DI (reg/v/f:DI 126 [ bar ])
                (const_int 12 [0xc])) [4 <variable>.generation+0 S4 A32])) 279 
{*movsi_internal1} (nil)
    (nil))

(insn 39 38 40 2 ../../../libgomp/config/linux/bar.c:47 (set (reg/v:DI 124 [ 
generation ])
        (zero_extend:DI (reg:SI 131 [ <variable>.generation ]))) 14 
{*zero_extendsidi2_internal1} (nil)
    (nil))

(insn 40 39 41 2 ../../../libgomp/config/linux/bar.c:47 (set (reg/v/f:DI 122 [ 
mutex ])
        (reg/v/f:DI 126 [ bar ])) 300 {*movdi_internal64} (nil)
    (nil))

(note 41 40 42 2 ("../../../libgomp/config/linux/mutex.h") 54)

(insn 42 41 43 2 ../../../libgomp/config/linux/mutex.h:54 (parallel [
            (set (reg:SI 132)
                (mem/v:SI (reg/v/f:DI 122 [ mutex ]) [0 S4 A8]))
            (set (mem/v:SI (reg/v/f:DI 122 [ mutex ]) [0 S4 A8])
                (unspec:SI [
                        (const_int 0 [0x0])
                    ] 43))
            (clobber (scratch:SI))
            (clobber (scratch:CC))
        ]) 543 {sync_lock_test_and_setsi} (nil)
    (nil))

while in sched1 dump already:

(note:HI 41 43 42 2 ("../../../libgomp/config/linux/mutex.h") 54)

(insn:HI 42 41 128 2 ../../../libgomp/config/linux/mutex.h:54 (parallel [
            (set (reg:SI 132)
                (mem/v:SI (reg/v/f:DI 126 [ bar ]) [0 S4 A8]))
            (set (mem/v:SI (reg/v/f:DI 126 [ bar ]) [0 S4 A8])
                (unspec:SI [
                        (const_int 0 [0x0])
                    ] 43))
            (clobber (scratch:SI))
            (clobber (scratch:CC))
        ]) 543 {sync_lock_test_and_setsi} (nil)
    (expr_list:REG_UNUSED (scratch:CC)
        (expr_list:REG_UNUSED (scratch:SI)
            (nil))))

(note 128 42 39 2 ("../../../libgomp/config/linux/bar.c") 47)

(insn:HI 39 128 44 2 ../../../libgomp/config/linux/bar.c:47 (set (reg/v:DI 124 
[ generation ])
        (zero_extend:DI (mem/s:SI (plus:DI (reg/v/f:DI 126 [ bar ])
                    (const_int 12 [0xc])) [4 <variable>.generation+0 S4 A32]))) 
14 {*zero_extendsidi2_internal1} (nil)
    (nil))

Not sure why sched allows that, because insn 42 clearly operates on volatile
memory.  Do you think that's a bug in sched that it should be honoring
/v and not moving insns accross it, or that something is broken with the
ppc* patterns?

The mem in the sync insn has alias set 0 and no attributes
except MEM_VOLATLE_P.  The reason why sched1 decided to move it anyway
is that it proved that the MEMs are different:

(insn 38 37 39 2 ../../../libgomp/config/linux/bar.c:47 (set (reg:SI 131 [ 
<variable>.generation ])
        (mem/s:SI (plus:DI (reg/v/f:DI 126 [ bar ])
                (const_int 12 [0xc])) [4 <variable>.generation+0 S4 A32])) 279 
{*movsi_internal1} (nil)
    (nil))

(insn 40 39 41 2 ../../../libgomp/config/linux/bar.c:47 (set (reg/v/f:DI 122 [ 
mutex ])
        (reg/v/f:DI 126 [ bar ])) 300 {*movdi_internal64} (nil)
    (nil))

(note 41 40 42 2 ("../../../libgomp/config/linux/mutex.h") 54)

(insn 42 41 43 2 ../../../libgomp/config/linux/mutex.h:54 (parallel [
            (set (reg:SI 132)
                (mem/v:SI (reg/v/f:DI 122 [ mutex ]) [0 S4 A8]))
            (set (mem/v:SI (reg/v/f:DI 122 [ mutex ]) [0 S4 A8])
                (unspec:SI [
                        (const_int 0 [0x0])
                    ] 43))
            (clobber (scratch:SI))
            (clobber (scratch:CC))
        ]) 543 {sync_lock_test_and_setsi} (nil)
    (nil))

as pseudo 122 is set to 126, both MEMs have the same base register, but one
is SI(%r126) and SI(%r126 + 12), so writing into one provably won't affect
the other one and vice versa.
I still don't understand why loop opts can (if they do that) hoist memory loads
out of loops if the loop has UNSPEC_VOLATLE before the original memory 
reference.
Can it do the same with
for (...)
{
  __asm __volatile ("...." : "=r" (x) : "r" (y));
  z = mem;
  ...
}
=>
z = mem;
for (...)
{
  __asm __volatile ("...." : "=r" (x) : "r" (y));
  ...
}
?  If it can, can it do the same with
__asm __volatile ("...." : : : "memory");
?  If it can't with "memory" clobber, then perhaps we should add
(clobber (mem:BLK (scratch)))
to the sync builtin insns.

        Jakub

Re: libgomp and OMP_NUM_THREADS

Reply via email to