Re: [PATCH] hwmon: (nzxt-kraken2) mark and order concurrent accesses

2021-03-30 Thread Jonas Malaco
On Tue, Mar 30, 2021 at 03:51:21AM -0700, Guenter Roeck wrote:
> [ ... ]
> 
> Then please explain why _this_ use of time_after() is wrong but all
> others in the kernel are not. Also, please note that we are not
> concerned with code generation by the compiler as long as the
> generated code is correct (and I don't see any indication that
> it isn't).

The accesses to priv->temp_input[], ->fan_input[] and ->updated (where
this relates to your question about time_after()) are not wrong, but
undefined.

But if we are not concerned with code that is currently generated
correctly, which indeed is the case in the five arch x compiler
combinations I tested, then there simply is no point to this patch.

Thanks for going through this with me,
Jonas

P.S. Admittedly from a sample way too small, but in the kernel
time_after(jiffies, x) calls do not generally appear to involve an
expression x with a data race.  And there are a few calls where
READ_ONCE() is indeed used in x.


Re: [PATCH] hwmon: (nzxt-kraken2) mark and order concurrent accesses

2021-03-30 Thread Guenter Roeck
On 3/29/21 11:27 PM, Jonas Malaco wrote:
> On Mon, Mar 29, 2021 at 10:43:55PM -0700, Guenter Roeck wrote:
>> On 3/29/21 8:16 PM, Jonas Malaco wrote:
>>> On Mon, Mar 29, 2021 at 06:01:00PM -0700, Guenter Roeck wrote:
 On 3/29/21 5:21 PM, Jonas Malaco wrote:
> On Mon, Mar 29, 2021 at 02:53:39PM -0700, Guenter Roeck wrote:
>> On Mon, Mar 29, 2021 at 05:22:01AM -0300, Jonas Malaco wrote:
>>> To avoid a spinlock, the driver explores concurrent memory accesses
>>> between _raw_event and _read, having the former updating fields on a
>>> data structure while the latter could be reading from them.  Because
>>> these are "plain" accesses, those are data races according to the Linux
>>> kernel memory model (LKMM).
>>>
>>> Data races are undefined behavior in both C11 and LKMM.  In practice,
>>> the compiler is free to make optimizations assuming there is no data
>>> race, including load tearing, load fusing and many others,[1] most of
>>> which could result in corruption of the values reported to user-space.
>>>
>>> Prevent undesirable optimizations to those concurrent accesses by
>>> marking them with READ_ONCE() and WRITE_ONCE().  This also removes the
>>> data races, according to the LKMM, because both loads and stores to each
>>> location are now "marked" accesses.
>>>
>>> As a special case, use smp_load_acquire() and smp_load_release() when
>>> loading and storing ->updated, as it is used to track the validity of
>>> the other values, and thus has to be stored after and loaded before
>>> them.  These imply READ_ONCE()/WRITE_ONCE() but also ensure the desired
>>> order of memory accesses.
>>>
>>> [1] https://lwn.net/Articles/793253/
>>>
>>
>> I think you lost me a bit there. What out-of-order accesses that would be
>> triggered by a compiler optimization are you concerned about here ?
>> The only "problem" I can think of is that priv->updated may have been
>> written before the actual values. The impact would be ... zero. An
>> attribute read would return "stale" data for a few microseconds.
>> Why is that a concern, and what difference does it make ?
>
> The impact of out-of-order accesses to priv->updated is indeed minimal.
>
> That said, smp_load_acquire() and smp_store_release() were meant to
> prevent reordering at runtime, and only affect architectures other than
> x86.  READ_ONCE() and WRITE_ONCE() would already prevent reordering from
> compiler optimizations, and x86 provides the load-acquire/store-release
> semantics by default.
>
> But the reordering issue is not a concern to me, I got carried away when
> adding READ_ONCE()/WRITE_ONCE().  While smp_load_acquire() and
> smp_store_release() make the code work more like I intend it to, they
> are (small) costs we can spare.
>
> I still think that READ_ONCE()/WRITE_ONCE() are necessary, including for
> priv->updated.  Do you agree?
>

 No. What is the point ? The order of writes doesn't matter, the writes 
 won't
 be randomly dropped, and it doesn't matter if the reader reports old values
 for a couple of microseconds either. This would be different if the values
 were used as synchronization primitives or similar, but that isn't the case
 here. As for priv->updated, if you are concerned about lost reports and
 the 4th report is received a few microseconds before the read, I'd suggest
 to loosen the interval a bit instead.

 Supposedly we are getting reports every 500ms. We have two situations:
 - More than three reports are lost, making priv->updated somewhat relevant.
   In this case, it doesn't matter if outdated values are reported for
   a few uS since most/many/some reports are outdated more than a second
   anyway.
 - A report is received but old values are reported for a few uS. That
   doesn't matter either because reports are always outdated anyway by
   much more than a few uS anyway, and the code already tolerates up to
   2 seconds of lost reports.

 Sorry, I completely fail to see the problem you are trying to solve here.
>>>
>>> Please disregard the out-of-order accesses, I agree that preventing them
>>> "are a (small) cost we can spare".
>>>
>>> The main problem I still would like to address are the data races.
>>> While the stores and loads cannot be dropped, and we can tolerate their
>>> reordering, they could still be teared, fused, perhaps invented...
>>> According to [1] these types of optimizations are not unheard.
>>>
>>> Load tearing alone could easily produce values that are not stale, but
>>> wrong.  Do we also tolerate wrong values, even if they are infrequent?
>>>
>>> Another detail I should have mentioned sooner is that READ_ONCE() and
>>> WRITE_ONCE() cause only minor (gcc) to no (clang) changes to the
>>> generated code for x86_64 and 

Re: [PATCH] hwmon: (nzxt-kraken2) mark and order concurrent accesses

2021-03-30 Thread Jonas Malaco
On Mon, Mar 29, 2021 at 10:43:55PM -0700, Guenter Roeck wrote:
> On 3/29/21 8:16 PM, Jonas Malaco wrote:
> > On Mon, Mar 29, 2021 at 06:01:00PM -0700, Guenter Roeck wrote:
> >> On 3/29/21 5:21 PM, Jonas Malaco wrote:
> >>> On Mon, Mar 29, 2021 at 02:53:39PM -0700, Guenter Roeck wrote:
>  On Mon, Mar 29, 2021 at 05:22:01AM -0300, Jonas Malaco wrote:
> > To avoid a spinlock, the driver explores concurrent memory accesses
> > between _raw_event and _read, having the former updating fields on a
> > data structure while the latter could be reading from them.  Because
> > these are "plain" accesses, those are data races according to the Linux
> > kernel memory model (LKMM).
> >
> > Data races are undefined behavior in both C11 and LKMM.  In practice,
> > the compiler is free to make optimizations assuming there is no data
> > race, including load tearing, load fusing and many others,[1] most of
> > which could result in corruption of the values reported to user-space.
> >
> > Prevent undesirable optimizations to those concurrent accesses by
> > marking them with READ_ONCE() and WRITE_ONCE().  This also removes the
> > data races, according to the LKMM, because both loads and stores to each
> > location are now "marked" accesses.
> >
> > As a special case, use smp_load_acquire() and smp_load_release() when
> > loading and storing ->updated, as it is used to track the validity of
> > the other values, and thus has to be stored after and loaded before
> > them.  These imply READ_ONCE()/WRITE_ONCE() but also ensure the desired
> > order of memory accesses.
> >
> > [1] https://lwn.net/Articles/793253/
> >
> 
>  I think you lost me a bit there. What out-of-order accesses that would be
>  triggered by a compiler optimization are you concerned about here ?
>  The only "problem" I can think of is that priv->updated may have been
>  written before the actual values. The impact would be ... zero. An
>  attribute read would return "stale" data for a few microseconds.
>  Why is that a concern, and what difference does it make ?
> >>>
> >>> The impact of out-of-order accesses to priv->updated is indeed minimal.
> >>>
> >>> That said, smp_load_acquire() and smp_store_release() were meant to
> >>> prevent reordering at runtime, and only affect architectures other than
> >>> x86.  READ_ONCE() and WRITE_ONCE() would already prevent reordering from
> >>> compiler optimizations, and x86 provides the load-acquire/store-release
> >>> semantics by default.
> >>>
> >>> But the reordering issue is not a concern to me, I got carried away when
> >>> adding READ_ONCE()/WRITE_ONCE().  While smp_load_acquire() and
> >>> smp_store_release() make the code work more like I intend it to, they
> >>> are (small) costs we can spare.
> >>>
> >>> I still think that READ_ONCE()/WRITE_ONCE() are necessary, including for
> >>> priv->updated.  Do you agree?
> >>>
> >>
> >> No. What is the point ? The order of writes doesn't matter, the writes 
> >> won't
> >> be randomly dropped, and it doesn't matter if the reader reports old values
> >> for a couple of microseconds either. This would be different if the values
> >> were used as synchronization primitives or similar, but that isn't the case
> >> here. As for priv->updated, if you are concerned about lost reports and
> >> the 4th report is received a few microseconds before the read, I'd suggest
> >> to loosen the interval a bit instead.
> >>
> >> Supposedly we are getting reports every 500ms. We have two situations:
> >> - More than three reports are lost, making priv->updated somewhat relevant.
> >>   In this case, it doesn't matter if outdated values are reported for
> >>   a few uS since most/many/some reports are outdated more than a second
> >>   anyway.
> >> - A report is received but old values are reported for a few uS. That
> >>   doesn't matter either because reports are always outdated anyway by
> >>   much more than a few uS anyway, and the code already tolerates up to
> >>   2 seconds of lost reports.
> >>
> >> Sorry, I completely fail to see the problem you are trying to solve here.
> > 
> > Please disregard the out-of-order accesses, I agree that preventing them
> > "are a (small) cost we can spare".
> > 
> > The main problem I still would like to address are the data races.
> > While the stores and loads cannot be dropped, and we can tolerate their
> > reordering, they could still be teared, fused, perhaps invented...
> > According to [1] these types of optimizations are not unheard.
> > 
> > Load tearing alone could easily produce values that are not stale, but
> > wrong.  Do we also tolerate wrong values, even if they are infrequent?
> > 
> > Another detail I should have mentioned sooner is that READ_ONCE() and
> > WRITE_ONCE() cause only minor (gcc) to no (clang) changes to the
> > generated code for x86_64 and i386.[2]  While this seems contrary to the
> > 

Re: [PATCH] hwmon: (nzxt-kraken2) mark and order concurrent accesses

2021-03-29 Thread Guenter Roeck
On 3/29/21 8:16 PM, Jonas Malaco wrote:
> On Mon, Mar 29, 2021 at 06:01:00PM -0700, Guenter Roeck wrote:
>> On 3/29/21 5:21 PM, Jonas Malaco wrote:
>>> On Mon, Mar 29, 2021 at 02:53:39PM -0700, Guenter Roeck wrote:
 On Mon, Mar 29, 2021 at 05:22:01AM -0300, Jonas Malaco wrote:
> To avoid a spinlock, the driver explores concurrent memory accesses
> between _raw_event and _read, having the former updating fields on a
> data structure while the latter could be reading from them.  Because
> these are "plain" accesses, those are data races according to the Linux
> kernel memory model (LKMM).
>
> Data races are undefined behavior in both C11 and LKMM.  In practice,
> the compiler is free to make optimizations assuming there is no data
> race, including load tearing, load fusing and many others,[1] most of
> which could result in corruption of the values reported to user-space.
>
> Prevent undesirable optimizations to those concurrent accesses by
> marking them with READ_ONCE() and WRITE_ONCE().  This also removes the
> data races, according to the LKMM, because both loads and stores to each
> location are now "marked" accesses.
>
> As a special case, use smp_load_acquire() and smp_load_release() when
> loading and storing ->updated, as it is used to track the validity of
> the other values, and thus has to be stored after and loaded before
> them.  These imply READ_ONCE()/WRITE_ONCE() but also ensure the desired
> order of memory accesses.
>
> [1] https://lwn.net/Articles/793253/
>

 I think you lost me a bit there. What out-of-order accesses that would be
 triggered by a compiler optimization are you concerned about here ?
 The only "problem" I can think of is that priv->updated may have been
 written before the actual values. The impact would be ... zero. An
 attribute read would return "stale" data for a few microseconds.
 Why is that a concern, and what difference does it make ?
>>>
>>> The impact of out-of-order accesses to priv->updated is indeed minimal.
>>>
>>> That said, smp_load_acquire() and smp_store_release() were meant to
>>> prevent reordering at runtime, and only affect architectures other than
>>> x86.  READ_ONCE() and WRITE_ONCE() would already prevent reordering from
>>> compiler optimizations, and x86 provides the load-acquire/store-release
>>> semantics by default.
>>>
>>> But the reordering issue is not a concern to me, I got carried away when
>>> adding READ_ONCE()/WRITE_ONCE().  While smp_load_acquire() and
>>> smp_store_release() make the code work more like I intend it to, they
>>> are (small) costs we can spare.
>>>
>>> I still think that READ_ONCE()/WRITE_ONCE() are necessary, including for
>>> priv->updated.  Do you agree?
>>>
>>
>> No. What is the point ? The order of writes doesn't matter, the writes won't
>> be randomly dropped, and it doesn't matter if the reader reports old values
>> for a couple of microseconds either. This would be different if the values
>> were used as synchronization primitives or similar, but that isn't the case
>> here. As for priv->updated, if you are concerned about lost reports and
>> the 4th report is received a few microseconds before the read, I'd suggest
>> to loosen the interval a bit instead.
>>
>> Supposedly we are getting reports every 500ms. We have two situations:
>> - More than three reports are lost, making priv->updated somewhat relevant.
>>   In this case, it doesn't matter if outdated values are reported for
>>   a few uS since most/many/some reports are outdated more than a second
>>   anyway.
>> - A report is received but old values are reported for a few uS. That
>>   doesn't matter either because reports are always outdated anyway by
>>   much more than a few uS anyway, and the code already tolerates up to
>>   2 seconds of lost reports.
>>
>> Sorry, I completely fail to see the problem you are trying to solve here.
> 
> Please disregard the out-of-order accesses, I agree that preventing them
> "are a (small) cost we can spare".
> 
> The main problem I still would like to address are the data races.
> While the stores and loads cannot be dropped, and we can tolerate their
> reordering, they could still be teared, fused, perhaps invented...
> According to [1] these types of optimizations are not unheard.
> 
> Load tearing alone could easily produce values that are not stale, but
> wrong.  Do we also tolerate wrong values, even if they are infrequent?
> 
> Another detail I should have mentioned sooner is that READ_ONCE() and
> WRITE_ONCE() cause only minor (gcc) to no (clang) changes to the
> generated code for x86_64 and i386.[2]  While this seems contrary to the
> point I am trying to make, I want to show that, for the most part, these
> changes just lock in a reasonable compiler behavior.
> 
> Specifically, on x86_64/gcc (the most relevant arch/compiler for this
> driver) the changes are restricted to 

Re: [PATCH] hwmon: (nzxt-kraken2) mark and order concurrent accesses

2021-03-29 Thread Jonas Malaco
On Mon, Mar 29, 2021 at 06:01:00PM -0700, Guenter Roeck wrote:
> On 3/29/21 5:21 PM, Jonas Malaco wrote:
> > On Mon, Mar 29, 2021 at 02:53:39PM -0700, Guenter Roeck wrote:
> >> On Mon, Mar 29, 2021 at 05:22:01AM -0300, Jonas Malaco wrote:
> >>> To avoid a spinlock, the driver explores concurrent memory accesses
> >>> between _raw_event and _read, having the former updating fields on a
> >>> data structure while the latter could be reading from them.  Because
> >>> these are "plain" accesses, those are data races according to the Linux
> >>> kernel memory model (LKMM).
> >>>
> >>> Data races are undefined behavior in both C11 and LKMM.  In practice,
> >>> the compiler is free to make optimizations assuming there is no data
> >>> race, including load tearing, load fusing and many others,[1] most of
> >>> which could result in corruption of the values reported to user-space.
> >>>
> >>> Prevent undesirable optimizations to those concurrent accesses by
> >>> marking them with READ_ONCE() and WRITE_ONCE().  This also removes the
> >>> data races, according to the LKMM, because both loads and stores to each
> >>> location are now "marked" accesses.
> >>>
> >>> As a special case, use smp_load_acquire() and smp_load_release() when
> >>> loading and storing ->updated, as it is used to track the validity of
> >>> the other values, and thus has to be stored after and loaded before
> >>> them.  These imply READ_ONCE()/WRITE_ONCE() but also ensure the desired
> >>> order of memory accesses.
> >>>
> >>> [1] https://lwn.net/Articles/793253/
> >>>
> >>
> >> I think you lost me a bit there. What out-of-order accesses that would be
> >> triggered by a compiler optimization are you concerned about here ?
> >> The only "problem" I can think of is that priv->updated may have been
> >> written before the actual values. The impact would be ... zero. An
> >> attribute read would return "stale" data for a few microseconds.
> >> Why is that a concern, and what difference does it make ?
> > 
> > The impact of out-of-order accesses to priv->updated is indeed minimal.
> > 
> > That said, smp_load_acquire() and smp_store_release() were meant to
> > prevent reordering at runtime, and only affect architectures other than
> > x86.  READ_ONCE() and WRITE_ONCE() would already prevent reordering from
> > compiler optimizations, and x86 provides the load-acquire/store-release
> > semantics by default.
> > 
> > But the reordering issue is not a concern to me, I got carried away when
> > adding READ_ONCE()/WRITE_ONCE().  While smp_load_acquire() and
> > smp_store_release() make the code work more like I intend it to, they
> > are (small) costs we can spare.
> > 
> > I still think that READ_ONCE()/WRITE_ONCE() are necessary, including for
> > priv->updated.  Do you agree?
> > 
> 
> No. What is the point ? The order of writes doesn't matter, the writes won't
> be randomly dropped, and it doesn't matter if the reader reports old values
> for a couple of microseconds either. This would be different if the values
> were used as synchronization primitives or similar, but that isn't the case
> here. As for priv->updated, if you are concerned about lost reports and
> the 4th report is received a few microseconds before the read, I'd suggest
> to loosen the interval a bit instead.
> 
> Supposedly we are getting reports every 500ms. We have two situations:
> - More than three reports are lost, making priv->updated somewhat relevant.
>   In this case, it doesn't matter if outdated values are reported for
>   a few uS since most/many/some reports are outdated more than a second
>   anyway.
> - A report is received but old values are reported for a few uS. That
>   doesn't matter either because reports are always outdated anyway by
>   much more than a few uS anyway, and the code already tolerates up to
>   2 seconds of lost reports.
> 
> Sorry, I completely fail to see the problem you are trying to solve here.

Please disregard the out-of-order accesses, I agree that preventing them
"are a (small) cost we can spare".

The main problem I still would like to address are the data races.
While the stores and loads cannot be dropped, and we can tolerate their
reordering, they could still be teared, fused, perhaps invented...
According to [1] these types of optimizations are not unheard.

Load tearing alone could easily produce values that are not stale, but
wrong.  Do we also tolerate wrong values, even if they are infrequent?

Another detail I should have mentioned sooner is that READ_ONCE() and
WRITE_ONCE() cause only minor (gcc) to no (clang) changes to the
generated code for x86_64 and i386.[2]  While this seems contrary to the
point I am trying to make, I want to show that, for the most part, these
changes just lock in a reasonable compiler behavior.

Specifically, on x86_64/gcc (the most relevant arch/compiler for this
driver) the changes are restricted to kraken2_read:

1.  Loading of priv->updated and jiffies are reordered, because
(with 

Re: [PATCH] hwmon: (nzxt-kraken2) mark and order concurrent accesses

2021-03-29 Thread Guenter Roeck
On 3/29/21 5:21 PM, Jonas Malaco wrote:
> On Mon, Mar 29, 2021 at 02:53:39PM -0700, Guenter Roeck wrote:
>> On Mon, Mar 29, 2021 at 05:22:01AM -0300, Jonas Malaco wrote:
>>> To avoid a spinlock, the driver explores concurrent memory accesses
>>> between _raw_event and _read, having the former updating fields on a
>>> data structure while the latter could be reading from them.  Because
>>> these are "plain" accesses, those are data races according to the Linux
>>> kernel memory model (LKMM).
>>>
>>> Data races are undefined behavior in both C11 and LKMM.  In practice,
>>> the compiler is free to make optimizations assuming there is no data
>>> race, including load tearing, load fusing and many others,[1] most of
>>> which could result in corruption of the values reported to user-space.
>>>
>>> Prevent undesirable optimizations to those concurrent accesses by
>>> marking them with READ_ONCE() and WRITE_ONCE().  This also removes the
>>> data races, according to the LKMM, because both loads and stores to each
>>> location are now "marked" accesses.
>>>
>>> As a special case, use smp_load_acquire() and smp_load_release() when
>>> loading and storing ->updated, as it is used to track the validity of
>>> the other values, and thus has to be stored after and loaded before
>>> them.  These imply READ_ONCE()/WRITE_ONCE() but also ensure the desired
>>> order of memory accesses.
>>>
>>> [1] https://lwn.net/Articles/793253/
>>>
>>
>> I think you lost me a bit there. What out-of-order accesses that would be
>> triggered by a compiler optimization are you concerned about here ?
>> The only "problem" I can think of is that priv->updated may have been
>> written before the actual values. The impact would be ... zero. An
>> attribute read would return "stale" data for a few microseconds.
>> Why is that a concern, and what difference does it make ?
> 
> The impact of out-of-order accesses to priv->updated is indeed minimal.
> 
> That said, smp_load_acquire() and smp_store_release() were meant to
> prevent reordering at runtime, and only affect architectures other than
> x86.  READ_ONCE() and WRITE_ONCE() would already prevent reordering from
> compiler optimizations, and x86 provides the load-acquire/store-release
> semantics by default.
> 
> But the reordering issue is not a concern to me, I got carried away when
> adding READ_ONCE()/WRITE_ONCE().  While smp_load_acquire() and
> smp_store_release() make the code work more like I intend it to, they
> are (small) costs we can spare.
> 
> I still think that READ_ONCE()/WRITE_ONCE() are necessary, including for
> priv->updated.  Do you agree?
> 

No. What is the point ? The order of writes doesn't matter, the writes won't
be randomly dropped, and it doesn't matter if the reader reports old values
for a couple of microseconds either. This would be different if the values
were used as synchronization primitives or similar, but that isn't the case
here. As for priv->updated, if you are concerned about lost reports and
the 4th report is received a few microseconds before the read, I'd suggest
to loosen the interval a bit instead.

Supposedly we are getting reports every 500ms. We have two situations:
- More than three reports are lost, making priv->updated somewhat relevant.
  In this case, it doesn't matter if outdated values are reported for
  a few uS since most/many/some reports are outdated more than a second
  anyway.
- A report is received but old values are reported for a few uS. That
  doesn't matter either because reports are always outdated anyway by
  much more than a few uS anyway, and the code already tolerates up to
  2 seconds of lost reports.

Sorry, I completely fail to see the problem you are trying to solve here.

Guenter


Re: [PATCH] hwmon: (nzxt-kraken2) mark and order concurrent accesses

2021-03-29 Thread Jonas Malaco
On Mon, Mar 29, 2021 at 02:53:39PM -0700, Guenter Roeck wrote:
> On Mon, Mar 29, 2021 at 05:22:01AM -0300, Jonas Malaco wrote:
> > To avoid a spinlock, the driver explores concurrent memory accesses
> > between _raw_event and _read, having the former updating fields on a
> > data structure while the latter could be reading from them.  Because
> > these are "plain" accesses, those are data races according to the Linux
> > kernel memory model (LKMM).
> > 
> > Data races are undefined behavior in both C11 and LKMM.  In practice,
> > the compiler is free to make optimizations assuming there is no data
> > race, including load tearing, load fusing and many others,[1] most of
> > which could result in corruption of the values reported to user-space.
> > 
> > Prevent undesirable optimizations to those concurrent accesses by
> > marking them with READ_ONCE() and WRITE_ONCE().  This also removes the
> > data races, according to the LKMM, because both loads and stores to each
> > location are now "marked" accesses.
> > 
> > As a special case, use smp_load_acquire() and smp_load_release() when
> > loading and storing ->updated, as it is used to track the validity of
> > the other values, and thus has to be stored after and loaded before
> > them.  These imply READ_ONCE()/WRITE_ONCE() but also ensure the desired
> > order of memory accesses.
> > 
> > [1] https://lwn.net/Articles/793253/
> > 
> 
> I think you lost me a bit there. What out-of-order accesses that would be
> triggered by a compiler optimization are you concerned about here ?
> The only "problem" I can think of is that priv->updated may have been
> written before the actual values. The impact would be ... zero. An
> attribute read would return "stale" data for a few microseconds.
> Why is that a concern, and what difference does it make ?

The impact of out-of-order accesses to priv->updated is indeed minimal.

That said, smp_load_acquire() and smp_store_release() were meant to
prevent reordering at runtime, and only affect architectures other than
x86.  READ_ONCE() and WRITE_ONCE() would already prevent reordering from
compiler optimizations, and x86 provides the load-acquire/store-release
semantics by default.

But the reordering issue is not a concern to me, I got carried away when
adding READ_ONCE()/WRITE_ONCE().  While smp_load_acquire() and
smp_store_release() make the code work more like I intend it to, they
are (small) costs we can spare.

I still think that READ_ONCE()/WRITE_ONCE() are necessary, including for
priv->updated.  Do you agree?

Thanks,
Jonas

P.S. Architectures other than x86 are admittedly a niche case for this
driver, but I would not rule them out.  Not only can the cooler be
adapted to mount on silicon other than mainstream Intel/AMD CPUs (and
there even exists a first-party adapter for graphics cards), but it can
trivially also be controlled by a secondary, possibly non-x86, system.


Re: [PATCH] hwmon: (nzxt-kraken2) mark and order concurrent accesses

2021-03-29 Thread Guenter Roeck
On Mon, Mar 29, 2021 at 05:22:01AM -0300, Jonas Malaco wrote:
> To avoid a spinlock, the driver explores concurrent memory accesses
> between _raw_event and _read, having the former updating fields on a
> data structure while the latter could be reading from them.  Because
> these are "plain" accesses, those are data races according to the Linux
> kernel memory model (LKMM).
> 
> Data races are undefined behavior in both C11 and LKMM.  In practice,
> the compiler is free to make optimizations assuming there is no data
> race, including load tearing, load fusing and many others,[1] most of
> which could result in corruption of the values reported to user-space.
> 
> Prevent undesirable optimizations to those concurrent accesses by
> marking them with READ_ONCE() and WRITE_ONCE().  This also removes the
> data races, according to the LKMM, because both loads and stores to each
> location are now "marked" accesses.
> 
> As a special case, use smp_load_acquire() and smp_load_release() when
> loading and storing ->updated, as it is used to track the validity of
> the other values, and thus has to be stored after and loaded before
> them.  These imply READ_ONCE()/WRITE_ONCE() but also ensure the desired
> order of memory accesses.
> 
> [1] https://lwn.net/Articles/793253/
> 

I think you lost me a bit there. What out-of-order accesses that would be
triggered by a compiler optimization are you concerned about here ?
The only "problem" I can think of is that priv->updated may have been
written before the actual values. The impact would be ... zero. An
attribute read would return "stale" data for a few microseconds.
Why is that a concern, and what difference does it make ?

Thanks,
Guenter


[PATCH] hwmon: (nzxt-kraken2) mark and order concurrent accesses

2021-03-29 Thread Jonas Malaco
To avoid a spinlock, the driver explores concurrent memory accesses
between _raw_event and _read, having the former updating fields on a
data structure while the latter could be reading from them.  Because
these are "plain" accesses, those are data races according to the Linux
kernel memory model (LKMM).

Data races are undefined behavior in both C11 and LKMM.  In practice,
the compiler is free to make optimizations assuming there is no data
race, including load tearing, load fusing and many others,[1] most of
which could result in corruption of the values reported to user-space.

Prevent undesirable optimizations to those concurrent accesses by
marking them with READ_ONCE() and WRITE_ONCE().  This also removes the
data races, according to the LKMM, because both loads and stores to each
location are now "marked" accesses.

As a special case, use smp_load_acquire() and smp_load_release() when
loading and storing ->updated, as it is used to track the validity of
the other values, and thus has to be stored after and loaded before
them.  These imply READ_ONCE()/WRITE_ONCE() but also ensure the desired
order of memory accesses.

[1] https://lwn.net/Articles/793253/

Signed-off-by: Jonas Malaco 
---
 drivers/hwmon/nzxt-kraken2.c | 23 ---
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/drivers/hwmon/nzxt-kraken2.c b/drivers/hwmon/nzxt-kraken2.c
index 89f7ea4f42d4..f4fbc8771930 100644
--- a/drivers/hwmon/nzxt-kraken2.c
+++ b/drivers/hwmon/nzxt-kraken2.c
@@ -46,16 +46,22 @@ static int kraken2_read(struct device *dev, enum 
hwmon_sensor_types type,
u32 attr, int channel, long *val)
 {
struct kraken2_priv_data *priv = dev_get_drvdata(dev);
+   unsigned long expires;
 
-   if (time_after(jiffies, priv->updated + STATUS_VALIDITY * HZ))
+   /*
+* Order load from ->updated before the data it refers to.
+*/
+   expires = smp_load_acquire(>updated) + STATUS_VALIDITY * HZ;
+
+   if (time_after(jiffies, expires))
return -ENODATA;
 
switch (type) {
case hwmon_temp:
-   *val = priv->temp_input[channel];
+   *val = READ_ONCE(priv->temp_input[channel]);
break;
case hwmon_fan:
-   *val = priv->fan_input[channel];
+   *val = READ_ONCE(priv->fan_input[channel]);
break;
default:
return -EOPNOTSUPP; /* unreachable */
@@ -119,12 +125,15 @@ static int kraken2_raw_event(struct hid_device *hdev,
 * and that the missing steps are artifacts of how the firmware
 * processes the raw sensor data.
 */
-   priv->temp_input[0] = data[1] * 1000 + data[2] * 100;
+   WRITE_ONCE(priv->temp_input[0], data[1] * 1000 + data[2] * 100);
 
-   priv->fan_input[0] = get_unaligned_be16(data + 3);
-   priv->fan_input[1] = get_unaligned_be16(data + 5);
+   WRITE_ONCE(priv->fan_input[0], get_unaligned_be16(data + 3));
+   WRITE_ONCE(priv->fan_input[1], get_unaligned_be16(data + 5));
 
-   priv->updated = jiffies;
+   /*
+* Order store to ->updated after the data it refers to.
+*/
+   smp_store_release(>updated, jiffies);
 
return 0;
 }

base-commit: 644b9af5c605762feffac96bd7ea2499e0197656
-- 
2.31.1