https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122041
--- Comment #1 from Petr Sumbera <sumbera at volny dot cz> --- While I'm not expert on GCC nor SPARC assembler I'm addding comment from AI: The GCC build of crc32_update_no_xor shows several patterns that may explain the slower runtime compared to Solaris Studio: Extra call – early in the function there is a call +0x0 (a call to the next instruction). This looks like a placeholder or missed optimization and is absent from the Studio output. Instruction mix – GCC emits a long sequence of shifts (sllx, srlx) and bitwise operations for each 8-byte block. Solaris Studio builds the 64-bit word from individual bytes and then performs fewer shifts and table lookups. Loop structure – Studio’s code shows clear unrolling and tight pointer arithmetic. GCC appears to perform more loads and logical operations per iteration, which likely increases instruction count and pressure on the integer units. Table access – GCC repeatedly loads from the CRC table (ld [%l3 + …]) inside the main loop, whereas Studio seems to combine lookups and XORs more efficiently.