[I was wrong: it is code elimination, not motion. volatile is
not a fix.]

On 2019-Jan-1, at 19:37, Mark Millard <marklmi at yahoo.com> wrote:

> [It looks to me like the assembler code has some code moved out of the
> loop that it should not be (by intent). It appears that calling
> tcmpset_32 does not prevent code motion to before the call and
> the variable involved was not declared volatile.]
> 
> On 2019-Jan-1, at 18:43, Mark Millard <marklmi at yahoo.com> wrote:
> 
>> The below showed up for poudiere-devel bulk getting stuck using one FreeBSD
>> cpu while building graphics/poppler-qt5 . This is not a kevent hang-up, 
>> unlike
>> the last hang-up that I analyzed. I do not yet know how repeatable this is
>> but the original hang-up and the one experiment the below is from.
>> 
>> From top:
>> 
>> PID USERNAME    THR PRI NICE   SIZE    RES SWAP STATE    C   TIME     CPU 
>> COMMAND
>> 12789 root          4  52    0   166M    33M    0 uwait    6  36:06  97.22% 
>> /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E cmake_autogen 
>> /wrkdirs/usr/ports/graphics/poppler-qt5/work/poppler-0
>> 
>> Note: The vast margority of the 36:06 has been stuck in the uwait loop 
>> involved.
>> 
>> From ps -auxd:
>> 
>> root       94075    0.0  0.0  12932  3552  1  S+   10:42        0:01.21 |    
>>        `-- sh -e /usr/local/share/poudriere/bulk.sh -jFBSDFSSDjailArmV7 -w 
>> graphics/poppler-qt5
>> root        1944    0.0  0.0  12932  3540  1  I+   10:42        0:00.00 |    
>>          |-- sh -e /usr/local/share/poudriere/bulk.sh -jFBSDFSSDjailArmV7 -w 
>> graphics/poppler-qt5
>> root        1957    0.0  0.0  12932  3556  1  I    10:42        0:00.04 |    
>>          |-- sh: poudriere[FBSDFSSDjailArmV7-default][01]: build_pkg 
>> (poppler-qt5-0.72.0) (sh)
>> root       12328    0.0  0.0  12932  3548  1  I    10:49        0:00.00 |    
>>          | `-- sh: poudriere[FBSDFSSDjailArmV7-default][01]: build_pkg 
>> (poppler-qt5-0.72.0) (sh)
>> root       12329    0.0  0.0  10328  1756  1  IJ   10:49        0:00.01 |    
>>          |   `-- /usr/bin/make -C /usr/ports/graphics/poppler-qt5 stage
>> root       12350    0.0  0.0   9860  1248  1  IJ   10:49        0:00.00 |    
>>          |     `-- /usr/bin/make -f Makefile 
>> DESTDIR=/wrkdirs/usr/ports/graphics/poppler-qt5/work/stage install
>> root       12353    0.0  0.0  10236  1664  1  IJ   10:49        0:00.05 |    
>>          |       `-- /nxb-bin/usr/bin/make -f CMakeFiles/Makefile2 qt5/all
>> root       12787    0.0  0.0   9856  1236  1  IJ   10:50        0:00.00 |    
>>          |         `-- /nxb-bin/usr/bin/make -f 
>> qt5/tests/CMakeFiles/check_qt5_attachments_autogen.dir/build.make qt5/test
>> root       12789  100.0  0.0 169868 33528  1  IJ   10:50       36:35.26 |    
>>          |           `-- /usr/local/bin/qemu-arm-static /usr/local/bin/cmake 
>> -E cmake_autogen /wrkdirs/usr/ports/graphics/
>> root       94423    0.0  0.0  12932  3484  1  S+   10:42        0:12.91 |    
>>          `-- sh -e /usr/local/share/poudriere/bulk.sh -jFBSDFSSDjailArmV7 -w 
>> graphics/poppler-qt5
>> 
>> 
>> (gdb) attach 12789
>> Attaching to process 12789
>> Reading symbols from 
>> /usr/local/poudriere/data/.m/FBSDFSSDjailArmV7-default/01/usr/local/bin/qemu-arm-static...done.
>> [New LWP 101168 of process 12789]
>> [New LWP 101178 of process 12789]
>> [New LWP 101499 of process 12789]
>> [Switching to LWP 100304 of process 12789]
>> _umtx_op () at _umtx_op.S:3
>> 3    RSYSCALL(_umtx_op)
>> (gdb) info threads
>> Id   Target Id                   Frame 
>> * 1    LWP 100304 of process 12789 _umtx_op () at _umtx_op.S:3
>> 2    LWP 101168 of process 12789 _umtx_op_err () at 
>> /usr/src/lib/libthr/arch/amd64/amd64/_umtx_op_err.S:37
>> 3    LWP 101178 of process 12789 _umtx_op () at _umtx_op.S:3
>> 4    LWP 101499 of process 12789 0x0000000060051c26 in atomic_cmpset_int 
>> (dst=<optimized out>, expect=<optimized out>, src=536870912) at 
>> /usr/include/machine/atomic.h:220
>> (gdb) thread 4
>> [Switching to thread 4 (LWP 101499 of process 12789)]
>> #0  0x0000000060051c26 in atomic_cmpset_int (dst=<optimized out>, 
>> expect=<optimized out>, src=536870912) at /usr/include/machine/atomic.h:220
>> 220  ATOMIC_CMPSET(int);
>> 
>> (gdb) bt
>> #0  0x0000000060051c26 in atomic_cmpset_int (dst=<optimized out>, 
>> expect=<optimized out>, src=536870912) at /usr/include/machine/atomic.h:220
>> #1  tcmpset_32 (addr=<optimized out>, a=<optimized out>, b=536870912) at 
>> /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/freebsd/os-thread.c:178
>> #2  freebsd_rw_unlock (target_addr=4108246528) at 
>> /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/freebsd/os-thread.c:1264
>> #3  0x000000006004ab33 in do_freebsd__umtx_op (obj=<optimized out>, 
>> op=536870912, val=<optimized out>, uaddr=<optimized out>, 
>> target_time=<optimized out>)
>>   at 
>> /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/freebsd/os-thread.h:474
>> #4  0x0000000060041b83 in do_freebsd_syscall (cpu_env=0x86159b118, num=454, 
>> arg1=<optimized out>, arg2=<optimized out>, arg3=<optimized out>, arg4=0, 
>> arg5=0, arg6=-184411592, arg7=-199471616, 
>>   arg8=-1622188640) at 
>> /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/syscall.c:1364
>> #5  0x00000000600392f0 in target_cpu_loop (env=0x86159b118) at 
>> /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/arm/target_arch_cpu.h:207
>> #6  0x0000000060038c99 in cpu_loop (env=0xf4dede80) at 
>> /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/main.c:121
>> #7  0x0000000060050c1a in new_freebsd_thread_start (arg=<optimized out>) at 
>> /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/freebsd/os-thread.c:152
>> #8  0x00000000601ad5f6 in thread_start (curthread=0x861571300) at 
>> /usr/src/lib/libthr/thread/thr_create.c:291
>> #9  0x0000000000000000 in ?? ()
>> Backtrace stopped: Cannot access memory at address 0x7fffdfbb3000
>> 
>> 181  #define ATOMIC_CMPSET(TYPE)                             \
>> 182  static __inline int                                     \
>> 183  atomic_cmpset_##TYPE(volatile u_##TYPE *dst, u_##TYPE expect, u_##TYPE 
>> src) \
>> 184  {                                                       \
>> (gdb) 
>> 185          u_char res;                                     \
>> 186                                                          \
>> 187          __asm __volatile(                               \
>> 188          "       " MPLOCKED "            "               \
>> 189          "       cmpxchg %3,%1 ; "                       \
>> 190          "       sete    %0 ;            "               \
>> 191          "# atomic_cmpset_" #TYPE "      "               \
>> 192          : "=q" (res),                   /* 0 */         \
>> 193            "+m" (*dst),                  /* 1 */         \
>> 194            "+a" (expect)                 /* 2 */         \
>> (gdb) 
>> 195          : "r" (src)                     /* 3 */         \
>> 196          : "memory", "cc");                              \
>> 197          return (res);                                   \
>> 198  }                                                       \
>> . . .
>> 218  ATOMIC_CMPSET(char);
>> 219  ATOMIC_CMPSET(short);
>> 220  ATOMIC_CMPSET(int);
>> 221  ATOMIC_CMPSET(long);
>> 
>> (gdb) up
>> #1  tcmpset_32 (addr=<optimized out>, a=<optimized out>, b=536870912) at 
>> /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/freebsd/os-thread.c:178
>> 178      return atomic_cmpset_acq_32(addr, current, new);
>> (gdb) list
>> 173  static int tcmpset_32(uint32_t *addr, uint32_t a, uint32_t b)
>> 174  {
>> 175      uint32_t current = tswap32(a);
>> 176      uint32_t new = tswap32(b);
>> 177  
>> 178      return atomic_cmpset_acq_32(addr, current, new);
>> 179  }
>> 
>> (gdb) up
>> #2  freebsd_rw_unlock (target_addr=4108246528) at 
>> /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/freebsd/os-thread.c:1264
>> 1264             if (!tcmpset_32(&target_urwlock->rw_state, state, (state  - 
>> 1))) {
>> (gdb) list
>> 1259             }
>> 1260         }
>> 1261     } else if (TARGET_URWLOCK_READER_COUNT(state) != 0) {
>> 1262         /* decrement reader count */
>> 1263         for (;;) {
>> 1264             if (!tcmpset_32(&target_urwlock->rw_state, state, (state  - 
>> 1))) {
>> 1265                 if (TARGET_URWLOCK_READER_COUNT(state) == 0) {
>> 1266                     unlock_user_struct(target_urwlock,
>> 1267                         target_addr, 1);
>> 1268                         return -TARGET_EPERM;
>> (gdb) 
>> 1269                  }
>> 1270             } else {
>> 1271                 break;
>> 1272             }
>> 1273         }
>> 1274     } else {
>> 1275         unlock_user_struct(target_urwlock, target_addr, 1);
>> 1276         return -TARGET_EPERM;
>> 1277     }
>> 1278 
>> 
>> The above loop is where the code is stuck looping, using most of the 
>> capacity of
>> one cpu (as FreeBSD classifies cpus). The value of state is stuck at:
>> 
>> (gdb) print state
>> $2 = 536870913
>> 
>> I'll note that the loop is inside qemu-arm-atatic, not in the cmake that it
>> is running.
>> 
>> 
>> 
>> To show a little more context before the assembler for the routine that ends 
>> up
>> containing that loop:
>> 
>> (gdb) up
>> #3  0x000000006004ab33 in do_freebsd__umtx_op (obj=<optimized out>, 
>> op=536870912, val=<optimized out>, uaddr=<optimized out>, 
>> target_time=<optimized out>)
>>   at 
>> /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/freebsd/os-thread.h:474
>> 474          ret = freebsd_rw_unlock(obj);
>> (gdb) list
>> 469              ret = freebsd_rw_wrlock(obj, val, 0, NULL);
>> 470          }
>> 471          break;
>> 472  
>> 473      case TARGET_UMTX_OP_RW_UNLOCK:
>> 474          ret = freebsd_rw_unlock(obj);
>> 475          break;
>> 476  
>> 477  #if defined(__FreeBSD_version) && __FreeBSD_version > 900000
>> 478  #ifdef UMTX_OP_MUTEX_WAKE2
>> 
>> (gdb) up
>> #4  0x0000000060041b83 in do_freebsd_syscall (cpu_env=0x86159b118, num=454, 
>> arg1=<optimized out>, arg2=<optimized out>, arg3=<optimized out>, arg4=0, 
>> arg5=0, arg6=-184411592, arg7=-199471616, 
>>   arg8=-1622188640) at 
>> /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/syscall.c:1364
>> 1364         ret = do_freebsd__umtx_op(arg1, arg2, arg3, arg4, arg5);
>> (gdb) list
>> 1359         ret = do_freebsd__umtx_unlock(arg1);
>> 1360         break;
>> 1361 #endif
>> 1362 
>> 1363     case TARGET_FREEBSD_NR__umtx_op: /* undocumented */
>> 1364         ret = do_freebsd__umtx_op(arg1, arg2, arg3, arg4, arg5);
>> 1365         break;
>> 1366 
>> 1367         /*
>> 1368          * ioctl(2)
>> 
>> As for the assembler level view for the routine containing the
>> loop (via inlining and such):
>> 
>> Dump of assembler code for function freebsd_rw_unlock:
>>  0x0000000060051bd0 <+0>:    push   %rbp
>>  0x0000000060051bd1 <+1>:    mov    %rsp,%rbp
>>  0x0000000060051bd4 <+4>:    push   %rbx
>>  0x0000000060051bd5 <+5>:    push   %rax
>>  0x0000000060051bd6 <+6>:    mov    %edi,%ebx
>>  0x0000000060051bd8 <+8>:    mov    $0x20,%esi
>>  0x0000000060051bdd <+13>:   mov    $0x3,%edx
>>  0x0000000060051be2 <+18>:   callq  0x60036b70 <page_check_range>
>>  0x0000000060051be7 <+23>:   mov    %eax,%ecx
>>  0x0000000060051be9 <+25>:   mov    $0xfffffff2,%eax
>>  0x0000000060051bee <+30>:   test   %ecx,%ecx
>>  0x0000000060051bf0 <+32>:   jne    0x60051c50 <freebsd_rw_unlock+128>
>>  0x0000000060051bf2 <+34>:   mov    %ebx,%edi
>>  0x0000000060051bf4 <+36>:   add    0x26a6355(%rip),%rdi        # 0x626f7f50 
>> <guest_base>
>>  0x0000000060051bfb <+43>:   je     0x60051c50 <freebsd_rw_unlock+128>
>>  0x0000000060051bfd <+45>:   mov    (%rdi),%ecx
>>  0x0000000060051bff <+47>:   mov    0x4(%rdi),%edx
>>  0x0000000060051c02 <+50>:   test   %ecx,%ecx
>>  0x0000000060051c04 <+52>:   js     0x60051c30 <freebsd_rw_unlock+96>
>>  0x0000000060051c06 <+54>:   test   $0x1fffffff,%ecx
>>  0x0000000060051c0c <+60>:   je     0x60051c4b <freebsd_rw_unlock+123>
>>  0x0000000060051c0e <+62>:   lea    -0x1(%rcx),%esi
>>  0x0000000060051c11 <+65>:   nopw   %cs:0x0(%rax,%rax,1)
>>  0x0000000060051c1b <+75>:   nopl   0x0(%rax,%rax,1)
>> NOTE: loop is below:
>>  0x0000000060051c20 <+80>:   mov    %ecx,%eax
>> => 0x0000000060051c22 <+82>: lock cmpxchg %esi,(%rdi)
>>  0x0000000060051c26 <+86>:   sete   %bl
>>  0x0000000060051c29 <+89>:   test   %bl,%bl
>>  0x0000000060051c2b <+91>:   je     0x60051c20 <freebsd_rw_unlock+80>
>> Note: loop is above
>>  0x0000000060051c2d <+93>:   jmp    0x60051c57 <freebsd_rw_unlock+135>
>>  0x0000000060051c2f <+95>:   nop
>>  0x0000000060051c30 <+96>:   mov    %ecx,%esi
>>  0x0000000060051c32 <+98>:   and    $0x7fffffff,%esi
>>  0x0000000060051c38 <+104>:  mov    %ecx,%eax
>>  0x0000000060051c3a <+106>:  lock cmpxchg %esi,(%rdi)
>>  0x0000000060051c3e <+110>:  sete   %bl
>>  0x0000000060051c41 <+113>:  test   %bl,%bl
>>  0x0000000060051c43 <+115>:  jne    0x60051c57 <freebsd_rw_unlock+135>
>>  0x0000000060051c45 <+117>:  mov    (%rdi),%ecx
>>  0x0000000060051c47 <+119>:  test   %ecx,%ecx
>>  0x0000000060051c49 <+121>:  js     0x60051c30 <freebsd_rw_unlock+96>
>>  0x0000000060051c4b <+123>:  mov    $0xffffffff,%eax
>>  0x0000000060051c50 <+128>:  add    $0x8,%rsp
>>  0x0000000060051c54 <+132>:  pop    %rbx
>>  0x0000000060051c55 <+133>:  pop    %rbp
>>  0x0000000060051c56 <+134>:  retq   
>>  0x0000000060051c57 <+135>:  test   $0x2,%dl
>>  0x0000000060051c5a <+138>:  jne    0x60051c70 <freebsd_rw_unlock+160>
>>  0x0000000060051c5c <+140>:  test   $0x40000000,%ecx
>>  0x0000000060051c62 <+146>:  jne    0x60051c82 <freebsd_rw_unlock+178>
>>  0x0000000060051c64 <+148>:  xor    %eax,%eax
>>  0x0000000060051c66 <+150>:  test   $0x20000000,%ecx
>>  0x0000000060051c6c <+156>:  je     0x60051c50 <freebsd_rw_unlock+128>
>>  0x0000000060051c6e <+158>:  jmp    0x60051c89 <freebsd_rw_unlock+185>
>>  0x0000000060051c70 <+160>:  test   $0x20000000,%ecx
>>  0x0000000060051c76 <+166>:  jne    0x60051c89 <freebsd_rw_unlock+185>
>>  0x0000000060051c78 <+168>:  xor    %eax,%eax
>>  0x0000000060051c7a <+170>:  test   $0x40000000,%ecx
>>  0x0000000060051c80 <+176>:  je     0x60051c50 <freebsd_rw_unlock+128>
>>  0x0000000060051c82 <+178>:  mov    $0x1,%edx
>>  0x0000000060051c87 <+183>:  jmp    0x60051c8e <freebsd_rw_unlock+190>
>>  0x0000000060051c89 <+185>:  mov    $0x7fffffff,%edx
>>  0x0000000060051c8e <+190>:  mov    $0x3,%esi
>>  0x0000000060051c93 <+195>:  xor    %ecx,%ecx
>>  0x0000000060051c95 <+197>:  xor    %r8d,%r8d
>>  0x0000000060051c98 <+200>:  callq  0x601b5970 <_umtx_op>
>>  0x0000000060051c9d <+205>:  mov    %eax,%edi
>>  0x0000000060051c9f <+207>:  add    $0x8,%rsp
>>  0x0000000060051ca3 <+211>:  pop    %rbx
>>  0x0000000060051ca4 <+212>:  pop    %rbp
>> --Type <RET> for more, q to quit, c to continue without paging--
>>  0x0000000060051ca5 <+213>:  jmpq   0x60040a40 <get_errno>
>> End of assembler dump.
>> 
>> For just before the "lock cmpxchg %esi,(%rdi)" in the loop:
>> 
>> rax            0x20000001          536870913
>> rbx            0xf4dede00          4108246528
>> rcx            0x20000001          536870913
>> rdx            0x2                 2
>> rsi            0x20000000          536870912
>> rdi            0xf4dede80          4108246656
>> 
>> (gdb) x 0xf4dede80
>> 0xf4dede80:  0x40000001
>> 
>>  0x0000000060051c20 <+80>:   mov    %ecx,%eax
>> => 0x0000000060051c22 <+82>: lock cmpxchg %esi,(%rdi)
>>  0x0000000060051c26 <+86>:   sete   %bl
>>  0x0000000060051c29 <+89>:   test   %bl,%bl
>>  0x0000000060051c2b <+91>:   je     0x60051c20 <freebsd_rw_unlock+80>
>> 
>> 
>> At this point I do not have interpretation of the details, not even
>> a comparison to the source code.
> 
> 
> . . . (misleading notes deleted) . . .

I was wrong about the code motion: in . . .

   } else if (TARGET_URWLOCK_READER_COUNT(state) != 0) {
        /* decrement reader count */
        for (;;) {
            if (!tcmpset_32(&target_urwlock->rw_state, state, (state  - 1))) {
                if (TARGET_URWLOCK_READER_COUNT(state) == 0) {
                    unlock_user_struct(target_urwlock,
                        target_addr, 1);
                        return -TARGET_EPERM;
                 }
            } else {
                break;
            }
        }

the outer test was leading to the inner-if being eliminated. While
"volatile state" stopped that from happening it did not fix the hang-up
in the loop. (There are likely other consequences to the use of memory
involved instead of just a register as well.)


===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

_______________________________________________
[email protected] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-emulation
To unsubscribe, send any mail to "[email protected]"

Reply via email to