Hi Yann,

Am 14.02.23 um 14:31 schrieb Yann Ylavic:
I think we have several issues here:
1. we should set WEAK_MEMORY_ORDERING for anything but __i386__,
__x86_64__, __s390__ and __s390x__
2. we should not use direct 64bit load/store on 32bit systems for
atomic operations
3. on 32bit systems the alignment for an uint64_t is usually 4 (not 8
like on a 64bit systems), which hurts (at best) or prevents atomicity
even with builtins.

For 1. and 2. the generic/mutex case was addressed in r1907541
(apr_atomic_set64() was doing the right thing already), and the
builtins case is addressed in r1907637 (hopefully).
Windows code is also concerned but I believe it's doing the right
thing already by checking _M_X64 (x86_64 CPU) or should we check for
_WIN64 too (or APR_SIZEOF_VOIDP >= 8) for WIN32 running on x64? What
about the alignment of uint64_t on WIN32, the Interlocked*64()
functions seem to require 8-byte alignment so should we fall back to
generic/mutex on WIN32 too?

For 3., r1907642 should now take care of the alignment to define (or
not) HAVE_ATOMIC_BUILTINS64 and HAVE__ATOMIC_BUILTINS64, with a
fallback to USE_ATOMICS_GENERIC for cases where it matters. This was
tested by looking at the (32bit) assembly generated for different
compilers/options/builtins here: https://godbolt.org/z/r4daf6b4v
(comment in/out some "__attribute__((aligned(8)))" and/or "&& 0" to
see what helps or not).
Notably, while both gcc and clang seem to implement the same kind of
compare-and-swap loop for _read64 and _set64 with -m32 and the legacy
__sync_* builtins, they seem to disagree on whether __atomic builtins
should defer to libatomic (thus possibly mutex) when an uint64_t is
not 8-byte aligned. gcc will use some x87/FPU instructions
(fild/fistp) regardless of the alignment whereas clang will issue
calls to libatomic whenever apr_uint64_t is not forcibly 8-byte
aligned.
Dunno who's correct here (whether x87 instructions work with 4-byte
alignment, CPU cacheline crossing and so on..), but this seems to beg
for a forced "typedef uint64_t apr_uint64_t
__attribute__((aligned(8)));" (an ABI change) or a new API for 64bit
atomics (with properly aligned apr_atomic_u64_t) on the APR side
anyway to do the right thing on 32bit systems..
For now (after r1907642) this means that compiling a 32bit APR will
USE_ATOMICS_BUILTINS64 with gcc and NEED_ATOMICS_GENERIC64 with clang.

Does this all work for you?

Thanks for all the fixes. I agree with all the changes you did. However, I don't know enough about the windows compilers to comment on that. I would also agree that a 8-byte aligned apr_uint64_t would probably make sense for apr 2.0. On x86 the FPU instructions may work without alignment, but I expect it to have some performance impact.

I have applied your commits to Debian's 1.7.2 and there was no test failure during build. I also executed testatomic on powerpc ~ 100 times without problems.


Finally it seems that both -march=i[56]86 (but not i[34]86) provide
the same atomic builtins too (with gcc), so maybe we could use them
instead of the forced generic/mutex implementation (currently) if it's
how distros build 32bit APR (per
https://bz.apache.org/bugzilla/show_bug.cgi?id=66457). So I think
something like this would be fine:

I think the intention of the current code was that libapr compiled on any 32bit x86 would run everywhere by default. I don't think this behavior should be changed for apr 1.x, but that is not a strong opinion because 486 is so old. In Debian, we use --enable-nonportable-atomics anyway.

Chers,
Stefan


Index: configure.in
===================================================================
--- configure.in    (revision 1907642)
+++ configure.in    (working copy)
@@ -566,6 +566,9 @@ if test "$ap_cv_atomic_builtins" = "yes" -o "$ap_c
      if test "$ap_cv__atomic_builtins" = "yes"; then
          AC_DEFINE(HAVE__ATOMIC_BUILTINS, 1, [Define if compiler
provides 32bit __atomic builtins])
      fi
+    has_atomic_builtins=yes
+else
+    has_atomic_builtins=no
  fi

  AC_CACHE_CHECK([whether the compiler provides 64bit atomic builtins],
[ap_cv_atomic_builtins64],
@@ -829,10 +832,15 @@ AC_ARG_ENABLE(nonportable-atomics,
     force_generic_atomics=yes
   fi
  ],
-[case $host_cpu in
-   i[[456]]86) force_generic_atomics=yes ;;
-   *) force_generic_atomics=no
-      case $host in
+[force_generic_atomics=no
+case $host_cpu in
+   i[[34]]86) force_generic_atomics=yes;;
+   i[[56]]86)
+      if test "$has_atomic_builtins" != "yes"; then
+        force_generic_atomics=yes
+      fi
+      ;;
+   *) case $host in
           *solaris2.10*)
              AC_TRY_COMPILE(
                  [#include <atomic.h>],
?


Regards;
Yann.

Reply via email to