> From: Stephen Hemminger [mailto:step...@networkplumber.org] > Sent: Friday, 17 May 2024 18.18 > > On Fri, 17 May 2024 08:44:42 +0200 > Morten Brørup <m...@smartsharesystems.com> wrote: > > > > From: Honnappa Nagarahalli [mailto:honnappa.nagaraha...@arm.com] > > > Sent: Friday, 17 May 2024 06.27 > > > > > > + Bruce for feedback on x86 architecture > > > > > > > On May 16, 2024, at 10:30 PM, Stephen Hemminger > <step...@networkplumber.org> > > > wrote: > > > > > > > > On Fri, 17 May 2024 02:45:12 +0000 > > > > Honnappa Nagarahalli <honnappa.nagaraha...@arm.com> wrote: > > > > > > > >>> + * A counter is 64 bit value that is safe from split read/write > > > >>> + * on 32 bit platforms. It assumes that only one cpu at a time > > > >> If we are defining the counter in this manner, then > implementation cannot > > > be generic. I think architectures will have constraints if they have > to ensure > > > the 64b variables are not split. > > > >> > > > >> I think we at least need the counter to be aligned on 8B boundary > to have > > > generic code. > > > > > > > > The C standard has always guaranteed that read and write to > unsigned log > > > will not be split. > > > As I understand, this is true only if the variable is an atomic > variable. If > > > not, there is no difference between atomic variables and non-atomic > variables. > > > > > > > Therefore if arch is 64 bit native there is no need for atomics > > > At least on ARM architecture, if the variable is not aligned on 8B > boundary, > > > the load or store are not atomic. I am sure it is the same on other > > > architectures. > > After reading this: Who's afraid of a big bad optimizing compiler? > https://lwn.net/Articles/793253/
Very interesting article! > > Looks like you are right, and atomic or read/write once is required. I don't like the performance tradeoff (for 64 bit architectures) in the v7 patch. For single-tread updated counters, we MUST have a high performance counter_add(), not using atomic read-modify-write. IMO calling counter_fetch() MUST be possible from another thread. This requires that the fast path thread stores the counter atomically (or using volatile), to ensure that the counter is not only kept in a CPU register, but stored in memory visible by other threads. For counter_reset(), we have multiple options: 0. Don't support counter resetting. Not really on option? 1. Only allow calling counter_reset() in the fast path thread updating the counter. This introduces no further requirements. 2. Allow calling counter_reset() from another thread, thereby zeroing the "counter" variable. This introduces a requirement for the "counter" to be thread-safe, so counter_add() must atomically read-modify-write the counter, which has a performance cost. 3. Allow calling counter_reset() from another thread, and introduce an "offset" variable for counter_fetch() and counter_reset() to provide thread-safe fetch/reset from other threads, using the consume-release pattern. I don't like option 2. I consider counters too important and frequently used in the fast path, to compromise on performance for counters. For counters updated by multiple fast path threads, atomic_fetch_add_explicit() of the "counter" variable seems unavoidable. > Perhaps introducing rte_read_once and rte_write_once is good idea? > Several drivers are implementing it already. The read_once/write_once are easier to use, but they lack the flexibility (regarding barriers and locking) provided by their atomic_explicit alternatives, which will impact performance in some use cases. We should strive for the highest possible performance, which means that we shouldn't introduce functions or design patterns preventing this. Please note: Security vs. performance is another matter - we certainly don't want to promote insecure code for the benefit of performance. But for "ease of coding" vs. performance, I prefer performance. That said, I agree that introducing rte_read/write_once functions for use by drivers to access hardware makes sense, to eliminate copy-paste variants in drivers. But how can we prevent such functions from being used for other purposes, where atomic_explicit should be used?