https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122041

--- Comment #1 from Petr Sumbera <sumbera at volny dot cz> ---
While I'm not expert on GCC nor SPARC assembler I'm addding comment from AI:

The GCC build of crc32_update_no_xor shows several patterns that may explain
the slower runtime compared to Solaris Studio:

Extra call – early in the function there is a call +0x0 (a call to the next
instruction).
This looks like a placeholder or missed optimization and is absent from the
Studio output.

Instruction mix – GCC emits a long sequence of shifts (sllx, srlx) and bitwise
operations for each 8-byte block.
Solaris Studio builds the 64-bit word from individual bytes and then performs
fewer shifts and table lookups.

Loop structure – Studio’s code shows clear unrolling and tight pointer
arithmetic.
GCC appears to perform more loads and logical operations per iteration, which
likely increases instruction count and pressure on the integer units.

Table access – GCC repeatedly loads from the CRC table (ld [%l3 + …]) inside
the main loop, whereas Studio seems to combine lookups and XORs more
efficiently.

Reply via email to