64 helper

H. Peter Anvin Sun, 23 Mar 2025 19:19:37 -0700

On March 23, 2025 8:16:24 AM PDT, Kuan-Wei Chiu <visitor...@gmail.com> wrote:
>On Thu, Mar 13, 2025 at 03:41:49PM +0800, Kuan-Wei Chiu wrote:
>> On Thu, Mar 13, 2025 at 12:29:13AM +0800, Kuan-Wei Chiu wrote:
>> > On Wed, Mar 12, 2025 at 11:51:12AM -0400, Yury Norov wrote:
>> > > On Tue, Mar 11, 2025 at 03:24:14PM -0700, H. Peter Anvin wrote:
>> > > > On March 11, 2025 3:01:30 PM PDT, Yury Norov <yury.no...@gmail.com> 
>> > > > wrote:
>> > > > >On Sun, Mar 09, 2025 at 11:48:26PM +0800, Kuan-Wei Chiu wrote:
>> > > > >> On Fri, Mar 07, 2025 at 12:07:02PM -0800, H. Peter Anvin wrote:
>> > > > >> > On March 7, 2025 11:53:10 AM PST, David Laight 
>> > > > >> > <david.laight.li...@gmail.com> wrote:
>> > > > >> > >On Fri, 07 Mar 2025 11:30:35 -0800
>> > > > >> > >"H. Peter Anvin" <h...@zytor.com> wrote:
>> > > > >> > >
>> > > > >> > >> On March 7, 2025 10:49:56 AM PST, Andrew Cooper 
>> > > > >> > >> <andrew.coop...@citrix.com> wrote:
>> > > > >> > >> >> (int)true most definitely is guaranteed to be 1.  
>> > > > >> > >> >
>> > > > >> > >> >That's not technically correct any more.
>> > > > >> > >> >
>> > > > >> > >> >GCC has introduced hardened bools that intentionally have bit 
>> > > > >> > >> >patterns
>> > > > >> > >> >other than 0 and 1.
>> > > > >> > >> >
>> > > > >> > >> >https://gcc.gnu.org/gcc-14/changes.html
>> > > > >> > >> >
>> > > > >> > >> >~Andrew  
>> > > > >> > >> 
>> > > > >> > >> Bit patterns in memory maybe (not that I can see the Linux 
>> > > > >> > >> kernel using them) but
>> > > > >> > >> for compiler-generated conversations that's still a given, or 
>> > > > >> > >> the manager isn't C
>> > > > >> > >> or anything even remotely like it.
>> > > > >> > >> 
>> > > > >> > >
>> > > > >> > >The whole idea of 'bool' is pretty much broken by design.
>> > > > >> > >The underlying problem is that values other than 'true' and 
>> > > > >> > >'false' can
>> > > > >> > >always get into 'bool' variables.
>> > > > >> > >
>> > > > >> > >Once that has happened it is all fubar.
>> > > > >> > >
>> > > > >> > >Trying to sanitise a value with (say):
>> > > > >> > >int f(bool v)
>> > > > >> > >{
>> > > > >> > > return (int)v & 1;
>> > > > >> > >}    
>> > > > >> > >just doesn't work (see https://www.godbolt.org/z/MEndP3q9j)
>> > > > >> > >
>> > > > >> > >I really don't see how using (say) 0xaa and 0x55 helps.
>> > > > >> > >What happens if the value is wrong? a trap or exception?, good 
>> > > > >> > >luck recovering
>> > > > >> > >from that.
>> > > > >> > >
>> > > > >> > > David
>> > > > >> > 
>> > > > >> > Did you just discover GIGO?
>> > > > >> 
>> > > > >> Thanks for all the suggestions.
>> > > > >> 
>> > > > >> I don't have a strong opinion on the naming or return type. I'm 
>> > > > >> still a
>> > > > >> bit confused about whether I can assume that casting bool to int 
>> > > > >> always
>> > > > >> results in 0 or 1.
>> > > > >> 
>> > > > >> If that's the case, since most people prefer bool over int as the
>> > > > >> return type and some are against introducing u1, my current plan is 
>> > > > >> to
>> > > > >> use the following in the next version:
>> > > > >> 
>> > > > >> bool parity_odd(u64 val);
>> > > > >> 
>> > > > >> This keeps the bool return type, renames the function for better
>> > > > >> clarity, and avoids extra maintenance burden by having just one
>> > > > >> function.
>> > > > >> 
>> > > > >> If I can't assume that casting bool to int always results in 0 or 1,
>> > > > >> would it be acceptable to keep the return type as int?
>> > > > >> 
>> > > > >> Would this work for everyone?
>> > > > >
>> > > > >Alright, it's clearly a split opinion. So what I would do myself in
>> > > > >such case is to look at existing code and see what people who really
>> > > > >need parity invent in their drivers:
>> > > > >
>> > > > >                                     bool      parity_odd
>> > > > >static inline int parity8(u8 val)       -               -
>> > > > >static u8 calc_parity(u8 val)           -               -
>> > > > >static int odd_parity(u8 c)             -               +
>> > > > >static int saa711x_odd_parity           -               +
>> > > > >static int max3100_do_parity            -               -
>> > > > >static inline int parity(unsigned x)    -               -
>> > > > >static int bit_parity(u32 pkt)          -               -
>> > > > >static int oa_tc6_get_parity(u32 p)     -               -
>> > > > >static u32 parity32(__le32 data)        -               -
>> > > > >static u32 parity(u32 sample)           -               -
>> > > > >static int get_parity(int number,       -               -
>> > > > >                      int size)
>> > > > >static bool i2cr_check_parity32(u32 v,  +               -
>> > > > >                        bool parity)
>> > > > >static bool i2cr_check_parity64(u64 v)  +               -
>> > > > >static int sw_parity(__u64 t)           -               -
>> > > > >static bool parity(u64 value)           +               -
>> > > > >
>> > > > >Now you can refer to that table say that int parity(uXX) is what
>> > > > >people want to see in their drivers.
>> > > > >
>> > > > >Whichever interface you choose, please discuss it's pros and cons.
>> > > > >What bloat-o-meter says for each option? What's maintenance burden?
>> > > > >Perf test? Look at generated code?
>> > > > >
>> > > > >I personally for a macro returning boolean, something like I
>> > > > >proposed at the very beginning.
>> > > > >
>> > > > >Thanks,
>> > > > >Yury
>> > > > 
>> > > > Also, please at least provide a way for an arch to opt in to using the 
>> > > > builtins, which seem to produce as good results or better at least on 
>> > > > some architectures like x86 and probably with CPU options that imply 
>> > > > fast popcnt is available.
>> > > 
>> > > Yeah. And because linux/bitops.h already includes asm/bitops.h
>> > > the simplest way would be wrapping generic implementation with
>> > > the #ifndef parity, similarly to how we handle find_next_bit case.
>> > > 
>> > > So:
>> > > 1. Kuan-Wei, please don't invent something like ARCH_HAS_PARITY;
>> > > 2. This may, and probably should, be a separate follow-up series,
>> > >    likely created by corresponding arch experts.
>> > > 
>> > I saw discussions in the previous email thread about both
>> > __builtin_parity and x86-specific implementations. However, from the
>> > discussion, I learned that before considering any optimization, we
>> > should first ask: which driver or subsystem actually cares about parity
>> > efficiency? If someone does, I can help with a micro-benchmark to
>> > provide performance numbers, but I don't have enough domain knowledge
>> > to identify hot paths where parity efficiency matters.
>> > 
>> IMHO,
>> 
>> If parity is never used in any hot path and we don't care about parity:
>> 
>> Then benchmarking its performance seems meaningless. In this case, a
>> function with a u64 argument would suffice, and we might not even need
>> a macro to optimize for different types—especially since the macro
>> requires special hacks to avoid compiler warnings. Also, I don't think
>> code size matters here. If it does, we should first consider making
>> parity a non-inline function in a .c file rather than an inline
>> function/macro in a header.
>> 
>> If parity is used in a hot path:
>> 
>> We need different handling for different type sizes. As previously
>> discussed, x86 assembly might use different instructions for u8 and
>> u16. This may sound stubborn, but I want to ask again: should we
>> consider using parity8/16/32/64 interfaces? Like in the i3c driver
>> example, if we only have a single parity macro that selects an
>> implementation based on type size, users must explicitly cast types.
>> If future users also need parity in a hot path, they might not be aware
>> of this requirement and end up generating suboptimal code. Since we
>> care about efficiency and generated code, why not follow hweight() and
>> provide separate implementations for different sizes?
>> 
>It seems no one will reply to my two emails. So, I have summarized
>different interface approaches. If there is a next version, I will send
>it after the merge window closes.
>
>Interface 1: Single Function
>Description: bool parity_odd(u64)
>Pros: Minimal maintenance cost
>Cons: Difficult to integrate with architecture-specific implementations
>      due to the inability to optimize for different argument sizes
>Opinions: Jiri supports this approach
>
>Interface 2: Single Macro
>Description: parity_odd() macro
>Pros: Allows type-specific implementation
>Cons: Requires hacks to avoid warnings; users may need explicit
>      casting; potential sub-optimal code on 32-bit x86
>Opinions: Yury supports this approach
>
>Interface 3: Multiple Functions
>Description: bool parity_odd8/16/32/64()
>Pros: No need for explicit casting; easy to integrate
>      architecture-specific optimizations; except for parity8(), all
>      functions are one-liners with no significant code duplication
>Cons: More functions may increase maintenance burden
>Opinions: Only I support this approach
>
>Regards,
>Kuan-Wei


You can add me to the final option. I think it makes most sense

Re: [PATCH v3 00/16] Introduce and use generic parity16/32/64 helper

Reply via email to