On March 23, 2025 8:16:24 AM PDT, Kuan-Wei Chiu <visitor...@gmail.com> wrote: >On Thu, Mar 13, 2025 at 03:41:49PM +0800, Kuan-Wei Chiu wrote: >> On Thu, Mar 13, 2025 at 12:29:13AM +0800, Kuan-Wei Chiu wrote: >> > On Wed, Mar 12, 2025 at 11:51:12AM -0400, Yury Norov wrote: >> > > On Tue, Mar 11, 2025 at 03:24:14PM -0700, H. Peter Anvin wrote: >> > > > On March 11, 2025 3:01:30 PM PDT, Yury Norov <yury.no...@gmail.com> >> > > > wrote: >> > > > >On Sun, Mar 09, 2025 at 11:48:26PM +0800, Kuan-Wei Chiu wrote: >> > > > >> On Fri, Mar 07, 2025 at 12:07:02PM -0800, H. Peter Anvin wrote: >> > > > >> > On March 7, 2025 11:53:10 AM PST, David Laight >> > > > >> > <david.laight.li...@gmail.com> wrote: >> > > > >> > >On Fri, 07 Mar 2025 11:30:35 -0800 >> > > > >> > >"H. Peter Anvin" <h...@zytor.com> wrote: >> > > > >> > > >> > > > >> > >> On March 7, 2025 10:49:56 AM PST, Andrew Cooper >> > > > >> > >> <andrew.coop...@citrix.com> wrote: >> > > > >> > >> >> (int)true most definitely is guaranteed to be 1. >> > > > >> > >> > >> > > > >> > >> >That's not technically correct any more. >> > > > >> > >> > >> > > > >> > >> >GCC has introduced hardened bools that intentionally have bit >> > > > >> > >> >patterns >> > > > >> > >> >other than 0 and 1. >> > > > >> > >> > >> > > > >> > >> >https://gcc.gnu.org/gcc-14/changes.html >> > > > >> > >> > >> > > > >> > >> >~Andrew >> > > > >> > >> >> > > > >> > >> Bit patterns in memory maybe (not that I can see the Linux >> > > > >> > >> kernel using them) but >> > > > >> > >> for compiler-generated conversations that's still a given, or >> > > > >> > >> the manager isn't C >> > > > >> > >> or anything even remotely like it. >> > > > >> > >> >> > > > >> > > >> > > > >> > >The whole idea of 'bool' is pretty much broken by design. >> > > > >> > >The underlying problem is that values other than 'true' and >> > > > >> > >'false' can >> > > > >> > >always get into 'bool' variables. >> > > > >> > > >> > > > >> > >Once that has happened it is all fubar. >> > > > >> > > >> > > > >> > >Trying to sanitise a value with (say): >> > > > >> > >int f(bool v) >> > > > >> > >{ >> > > > >> > > return (int)v & 1; >> > > > >> > >} >> > > > >> > >just doesn't work (see https://www.godbolt.org/z/MEndP3q9j) >> > > > >> > > >> > > > >> > >I really don't see how using (say) 0xaa and 0x55 helps. >> > > > >> > >What happens if the value is wrong? a trap or exception?, good >> > > > >> > >luck recovering >> > > > >> > >from that. >> > > > >> > > >> > > > >> > > David >> > > > >> > >> > > > >> > Did you just discover GIGO? >> > > > >> >> > > > >> Thanks for all the suggestions. >> > > > >> >> > > > >> I don't have a strong opinion on the naming or return type. I'm >> > > > >> still a >> > > > >> bit confused about whether I can assume that casting bool to int >> > > > >> always >> > > > >> results in 0 or 1. >> > > > >> >> > > > >> If that's the case, since most people prefer bool over int as the >> > > > >> return type and some are against introducing u1, my current plan is >> > > > >> to >> > > > >> use the following in the next version: >> > > > >> >> > > > >> bool parity_odd(u64 val); >> > > > >> >> > > > >> This keeps the bool return type, renames the function for better >> > > > >> clarity, and avoids extra maintenance burden by having just one >> > > > >> function. >> > > > >> >> > > > >> If I can't assume that casting bool to int always results in 0 or 1, >> > > > >> would it be acceptable to keep the return type as int? >> > > > >> >> > > > >> Would this work for everyone? >> > > > > >> > > > >Alright, it's clearly a split opinion. So what I would do myself in >> > > > >such case is to look at existing code and see what people who really >> > > > >need parity invent in their drivers: >> > > > > >> > > > > bool parity_odd >> > > > >static inline int parity8(u8 val) - - >> > > > >static u8 calc_parity(u8 val) - - >> > > > >static int odd_parity(u8 c) - + >> > > > >static int saa711x_odd_parity - + >> > > > >static int max3100_do_parity - - >> > > > >static inline int parity(unsigned x) - - >> > > > >static int bit_parity(u32 pkt) - - >> > > > >static int oa_tc6_get_parity(u32 p) - - >> > > > >static u32 parity32(__le32 data) - - >> > > > >static u32 parity(u32 sample) - - >> > > > >static int get_parity(int number, - - >> > > > > int size) >> > > > >static bool i2cr_check_parity32(u32 v, + - >> > > > > bool parity) >> > > > >static bool i2cr_check_parity64(u64 v) + - >> > > > >static int sw_parity(__u64 t) - - >> > > > >static bool parity(u64 value) + - >> > > > > >> > > > >Now you can refer to that table say that int parity(uXX) is what >> > > > >people want to see in their drivers. >> > > > > >> > > > >Whichever interface you choose, please discuss it's pros and cons. >> > > > >What bloat-o-meter says for each option? What's maintenance burden? >> > > > >Perf test? Look at generated code? >> > > > > >> > > > >I personally for a macro returning boolean, something like I >> > > > >proposed at the very beginning. >> > > > > >> > > > >Thanks, >> > > > >Yury >> > > > >> > > > Also, please at least provide a way for an arch to opt in to using the >> > > > builtins, which seem to produce as good results or better at least on >> > > > some architectures like x86 and probably with CPU options that imply >> > > > fast popcnt is available. >> > > >> > > Yeah. And because linux/bitops.h already includes asm/bitops.h >> > > the simplest way would be wrapping generic implementation with >> > > the #ifndef parity, similarly to how we handle find_next_bit case. >> > > >> > > So: >> > > 1. Kuan-Wei, please don't invent something like ARCH_HAS_PARITY; >> > > 2. This may, and probably should, be a separate follow-up series, >> > > likely created by corresponding arch experts. >> > > >> > I saw discussions in the previous email thread about both >> > __builtin_parity and x86-specific implementations. However, from the >> > discussion, I learned that before considering any optimization, we >> > should first ask: which driver or subsystem actually cares about parity >> > efficiency? If someone does, I can help with a micro-benchmark to >> > provide performance numbers, but I don't have enough domain knowledge >> > to identify hot paths where parity efficiency matters. >> > >> IMHO, >> >> If parity is never used in any hot path and we don't care about parity: >> >> Then benchmarking its performance seems meaningless. In this case, a >> function with a u64 argument would suffice, and we might not even need >> a macro to optimize for different types—especially since the macro >> requires special hacks to avoid compiler warnings. Also, I don't think >> code size matters here. If it does, we should first consider making >> parity a non-inline function in a .c file rather than an inline >> function/macro in a header. >> >> If parity is used in a hot path: >> >> We need different handling for different type sizes. As previously >> discussed, x86 assembly might use different instructions for u8 and >> u16. This may sound stubborn, but I want to ask again: should we >> consider using parity8/16/32/64 interfaces? Like in the i3c driver >> example, if we only have a single parity macro that selects an >> implementation based on type size, users must explicitly cast types. >> If future users also need parity in a hot path, they might not be aware >> of this requirement and end up generating suboptimal code. Since we >> care about efficiency and generated code, why not follow hweight() and >> provide separate implementations for different sizes? >> >It seems no one will reply to my two emails. So, I have summarized >different interface approaches. If there is a next version, I will send >it after the merge window closes. > >Interface 1: Single Function >Description: bool parity_odd(u64) >Pros: Minimal maintenance cost >Cons: Difficult to integrate with architecture-specific implementations > due to the inability to optimize for different argument sizes >Opinions: Jiri supports this approach > >Interface 2: Single Macro >Description: parity_odd() macro >Pros: Allows type-specific implementation >Cons: Requires hacks to avoid warnings; users may need explicit > casting; potential sub-optimal code on 32-bit x86 >Opinions: Yury supports this approach > >Interface 3: Multiple Functions >Description: bool parity_odd8/16/32/64() >Pros: No need for explicit casting; easy to integrate > architecture-specific optimizations; except for parity8(), all > functions are one-liners with no significant code duplication >Cons: More functions may increase maintenance burden >Opinions: Only I support this approach > >Regards, >Kuan-Wei
You can add me to the final option. I think it makes most sense