On Tue, 15 Oct 2019 at 01:16, Yann Ylavic <ylavic....@gmail.com> wrote:
>
> On Tue, Oct 8, 2019 at 1:10 PM <i...@apache.org> wrote:
> >
> > Author: ivan
> > Date: Tue Oct  8 11:10:32 2019
> > New Revision: 1868129
> >
> > URL: http://svn.apache.org/viewvc?rev=1868129&view=rev
> > Log:
> > apr_atomic_read64(): Fix non-atomic read on 32-bit Windows.
> []
> > --- apr/apr/trunk/atomic/win32/apr_atomic64.c (original)
> > +++ apr/apr/trunk/atomic/win32/apr_atomic64.c Tue Oct  8 11:10:32 2019
> > @@ -51,7 +51,9 @@ APR_DECLARE(void) apr_atomic_set64(volat
> >
> >  APR_DECLARE(apr_uint64_t) apr_atomic_read64(volatile apr_uint64_t *mem)
> >  {
> > -    return *mem;
> > +    /* 64-bit read is not atomic on 32-bit platform: use 
> > InterlockedCompareExchange
> > +       to perform atomic read. */
> > +    return InterlockedCompareExchange64((volatile LONG64 *)mem, 0, 0);
> >  }
>
> Shouldn't we make this #ifdef'd on WIN32/WIN64? Despite the win32
> dirname, I think both Windows(es) share the same atomic/ code (same
> applies to apr_atomic_set64() I think, where *mem = val should be fine
> on WIN64).
>
> Then I don't know if we care enough about WIN32, but if so the
> InterlockedCompareExchange64() call here is maybe a bit of a hammer
> for usual apr_atomic_read64() usage. For things like lightly check for
> the current 64bit shared value and do the heavy CAS work for some
> special/unlikely condition only, this change is going to kill
> performances. Since we provide _cas64() already, I fell that
> read64/set64() should always be lightweight (when/if possible).
>
I considered this optimization, but decided not to implement it to
avoid additional #ifdef.

Do you have any evidence that CAS would be significantly slower than
just read on x64?

As I see in disassembly InterlockedCompareExchange() is compiled to
just three instructions on x64:
[[[
xor         ebx,ebx
xor         eax,eax
lock cmpxchg qword ptr [src],rbx
]]]

Another small argument is that .NET always uses InterlockedCompareExchange()
for Interlocked.Read() [1]

But I can add special code for x64 if you think it's worth it, along the lines
of the attached patch.

> I know that [2] says "Simple reads and writes to properly aligned
> 64-bit variables are atomic on 64-bit Windows. Reads and writes to
> 64-bit values are not guaranteed to be atomic on 32-bit Windows.", but
> it looks like in practice for CPUs that Windows cares about, an
> aligned 8-bytes (eg. our apr_uint64_t type) load/store can always be
> atomic. The load/store are likely compiled using x87 registers or xmm
> or sse or at worse lock cmpxchg8b (i.e. the above), and supposedly
> some 64bit or more registers available on ARM too.
>
> I'm no Windows expert, but wonder if we could use inline
> ReadAcquire64/WriteRelease64() functions from winnt.h/wdm.h. They seem
> to pair with Interlocked* builtins (same API), and have the semantics
> we want. I found them by naïvely (and unsuccessfully) searching for
> InterlockedRead/InterlockedWrite() functions, and falling into the
> discussion in [1]. It seems that there is no such functions in the
> windows API because .. there are not needed (well, IMHO, if atomic
> load64/store64 should be implemented using cas64, they look needed to
> me..).
> This seems to be confirmed by the "winnt.h" file I downloaded from the
> net, for the CPUs handled (and cared about) there,
> ReadAcquire64/WriteRelease64() are implemented using volatile access
> (i.e. *(volatile LONG64*)mem [= val]), plus a barrier for arm(s?), so
> relying on the native acquire/release semantics from the CPU/cache.
>
> So possibly r1868129 is not necessary, or if
> ReadAcquire64/WriteRelease64() can be used easily could we do that
> instead? (having a look at the WIN32 assembly code for
> apr_atomic_read64/set64() using ReadAcquire64/WriteRelease64() could
> be instructive, though)
>
The r1868129 is necessary: there is test in r1868129 that fails on x86,
without fix.

While ReadAcquire64()/WriteRelease64() look promising,
1. They doesn't provide atomic read: test added in r1868129 fails if
   I replace InterlockedCompareExchange64() to ReadAcquire64() in
   apr_atomic_read64()..
2. They are undocumented.

--
Ivan
Index: atomic/win32/apr_atomic64.c
===================================================================
--- atomic/win32/apr_atomic64.c (revision 1868468)
+++ atomic/win32/apr_atomic64.c (working copy)
@@ -46,14 +46,28 @@
 
 APR_DECLARE(void) apr_atomic_set64(volatile apr_uint64_t *mem, apr_uint64_t 
val)
 {
+#if defined(_WIN64)
+    /* 
https://docs.microsoft.com/en-us/windows/win32/sync/interlocked-variable-access
+     * "Simple reads and writes to properly aligned 64-bit variables are 
atomic on 64-bit Windows."*/
+    *mem = val;
+#else
+    /* 64-bit write is not atomic on 32-bit platform: use InterlockedExchange64
+       to perform atomic write. */
     InterlockedExchange64((volatile LONG64 *)mem, val);
+#endif
 }
 
 APR_DECLARE(apr_uint64_t) apr_atomic_read64(volatile apr_uint64_t *mem)
 {
+#if defined(_WIN64)
+    /* 
https://docs.microsoft.com/en-us/windows/win32/sync/interlocked-variable-access
+     * "Simple reads and writes to properly aligned 64-bit variables are 
atomic on 64-bit Windows."*/
+    return *mem;
+#else
     /* 64-bit read is not atomic on 32-bit platform: use 
InterlockedCompareExchange
        to perform atomic read. */
     return InterlockedCompareExchange64((volatile LONG64 *)mem, 0, 0);
+#endif
 }
 
 APR_DECLARE(apr_uint64_t) apr_atomic_cas64(volatile apr_uint64_t *mem, 
apr_uint64_t with,

Reply via email to