https://sourceware.org/bugzilla/show_bug.cgi?id=34104

--- Comment #2 from Den <archicharmer at mail dot ru> ---
Hello, Maciej. Thank you for your feedback.

To answer your first question about that my 'roadblock' in tc-mips.c: I'm not
so deeply intimate with the inner workings of the BFD subsystem, formulating
the exact macro expansion syntax in GAS turned out to be quite daunting. For
instance, when trying to implement the 32-bit ctz sequence, I imagine that the
implementation draft should look like this:
case M_CTZ_R5900:
        used_at = 1;
    macro_build (NULL, "subu", "d,v,t", AT, ZERO, op[1]);
    macro_build (NULL, "and", "d,v,t", AT, op[1], AT);
    macro_build (NULL, "addiu", "t,r,j", AT, AT, -1);
    macro_build (NULL, "sltiu", "t,r,j", op[0], op[1], 1);
    macro_build (NULL, "plzcw", "d,s", AT, AT);
    macro_build (NULL, "sll", "d,w,<", op[0], op[0], 31);
    macro_build (NULL, "xori", "t,r,i", AT, AT, 31);
    macro_build (NULL, "sra", "d,w,<", op[0], op[0], 31);
    macro_build (NULL, "or", "d,v,t", op[0], op[0], AT);
  break;
I struggled with the ambiguity of the GAS internal API. I was unsure whether
passing NULL as the first argument (expressionS *ep) for purely
register-and-immediate instructions was correct, and I got bogged down looking
at other macros in tc-mips.c that rely on complex relocation flags (like
BFD_RELOC_MIPS_HIGHER) or internal formatting tokens (like SHFT_FMT). I only
set the instructions in the particular sequence, figured out the "d,v,t",
"t,r,j", etc. formats from mips-opc.c, and mapped the operands. Beyond that, I
would rather leave filling the remaining boilerplate and internal GAS mechanics
to a maintainer who lives and breathes this codebase.

Regarding your question about the supposed use case and why I implemented so
many sequences: my original, primary goal was to provide a fast clz macro for
the R5900, as it is the foundational building block for most bit-scanning
operations. However, when looking at mips-opc.c and mips.md, I noticed that
MIPS toolchains strictly enforce architectural symmetry through macros like
ISA_HAS_CLZ_CLO and ISA_HAS_CTZ_CTO. Since the presence of one opcode typically
implies the availability of its twin and their respective doubleword variants,
I decided to build the complete set to ensure the patch is architecturally
exhaustive and doesn't leave gaps in the ISA mapping. Ultimately, the goal
transformed into enabling the ISA_HAS_CLZ_CLO and ISA_HAS_CTZ_CTO conditional
flags for R5900 targets within the compiler, as I originally thought that
having the appropriate macros in binutils was a necessary prerequisite for
this.

Regarding the edge cases and the zero case mentioned in your third note: as
demonstrated in my attached test logs, I carefully aligned the outputs of the
branchless assembler sequences to bit-for-bit match the exact values returned
by the original libgcc functions (like __ctzsi2 and etc.) under the current O32
ABI toolchain. Since the existing environment already yields these specific
values for edge cases, these implementations serve as a strict drop-in
replacement, guaranteeing 100% behavioral compatibility while eliminating the
library call overhead.
As for the benchmarks, while I haven't designed a dedicated benchmarking suite
to gather precise statistical timing data, the architectural advantage is
self-evident. Currently, on the O32 toolchain, calling functions like __ctzsi2
or __clzsi2 forces a fallback to generic libgcc library calls, which introduce
function call overhead (jal/jr), stack manipulation, branching, and memory
lookups (pulling values from RAM/cache). Replacing this heavy routine with a
tight, branchless sequence of just 4 to 7 register-only instructions leveraging
the native plzcw opcode provides an intuitive and massive reduction in CPU
cycle consumption.

So, if I understand correctly, it is better to skip creating these macros in
GAS entirely and focus on making edits in GCC only? If that is the case, should
I start a new thread/ticket on the GCC side, or can we continue tracking the
compiler-side implementation here?

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Reply via email to