On Wed, Aug 21, 2019 at 11:48:43AM -0400, Mathieu Desnoyers wrote:
> ----- On Aug 21, 2019, at 8:33 AM, Peter Zijlstra pet...@infradead.org wrote:
> 
> > On Wed, Aug 21, 2019 at 06:23:10AM -0700, Paul E. McKenney wrote:
> >> On Wed, Aug 21, 2019 at 11:32:01AM +0100, Will Deacon wrote:
> > 
> >> > and so it is using a store-pair instruction to reduce the complexity in
> >> > the immediate generation. Thus, the 64-bit store will only have 32-bit
> >> > atomicity. In fact, this is scary because if I change bar to:
> >> > 
> >> > void bar(u64 *x)
> >> > {
> >> >  *(volatile u64 *)x = 0xabcdef10abcdef10;
> >> > }
> >> > 
> >> > then I get:
> >> > 
> >> > bar:
> >> >  mov     w1, 61200
> >> >  movk    w1, 0xabcd, lsl 16
> >> >  str     w1, [x0]
> >> >  str     w1, [x0, 4]
> >> >  ret
> >> > 
> >> > so I'm not sure that WRITE_ONCE would even help :/
> >> 
> >> Well, I can have the LWN article cite your email, then.  So thank you
> >> very much!
> >> 
> >> Is generation of this code for a 64-bit volatile store considered a bug?
> >> Or does ARMv8 exclude the possibility of 64-bit MMIO registers?  And I
> >> would guess that Thomas and Linus would ask a similar bugginess question
> >> for normal stores.  ;-)
> > 
> > I'm calling this a compiler bug; the way I understand volatile this is
> > very much against the intentended use case. That is, this is buggy even
> > on UP vs signals or MMIO.
> 
> And here is a simpler reproducer on my gcc-8.3.0 (aarch64) compiled with O2:
> 
> volatile unsigned long a;
>  
> void fct(void)
> {
>         a = 0x1234567812345678ULL;
> }
> 
> void fct(void)
> {
>         a = 0x1234567812345678ULL;
>    0:   90000000        adrp    x0, 8 <fct+0x8>
>    4:   528acf01        mov     w1, #0x5678                     // #22136
>    8:   72a24681        movk    w1, #0x1234, lsl #16
>    c:   f9400000        ldr     x0, [x0]
>   10:   b9000001        str     w1, [x0]
>   14:   b9000401        str     w1, [x0, #4]
> }
>   18:   d65f03c0        ret

Fwiw, and, interestingly, on clang v7.0.1-8 (aarch64), I get a proper 64-bit
str with the above example (even when not using volatile):

0000000000000000 <nonvol>:
   0:   d28acf08        mov     x8, #0x5678                     // #22136
   4:   f2a24688        movk    x8, #0x1234, lsl #16
   8:   f2cacf08        movk    x8, #0x5678, lsl #32
   c:   f2e24688        movk    x8, #0x1234, lsl #48
  10:   90000009        adrp    x9, 8 <nonvol+0x8>
  14:   91000129        add     x9, x9, #0x0
  18:   f9000128        str     x8, [x9]
  1c:   d65f03c0        ret

test1.o:     file format elf64-littleaarch64


And even with -O2 it is a single store:

Disassembly of section .text:

0000000000000000 <nonvol>:
   0:   d28acf09        mov     x9, #0x5678                     // #22136
   4:   f2a24689        movk    x9, #0x1234, lsl #16
   8:   f2cacf09        movk    x9, #0x5678, lsl #32
   c:   90000008        adrp    x8, 8 <nonvol+0x8>
  10:   f2e24689        movk    x9, #0x1234, lsl #48
  14:   f9000109        str     x9, [x8]
  18:   d65f03c0        ret

thanks,

 - Joel

[...]

Reply via email to