Oh, man.  I was wondering why, when I built my assembly version;

// Counts trailing Zero bits
TEXT ·Ctz(SB),NOSPLIT,$0-1
    MOVQ val+0(FP), AX  // value to be counted
    TZCNTQ AX, BX
    MOVB BX, ret+8(FP)
    RET

I got this:

$ go build -o hybrid8b.o .
$ go tool objdump -s hybrid8b.Ctz hybrid8b.o
warning: GOPATH set to GOROOT (/home/samv/go) has no effect
TEXT .../pkg/hybrid8b.Ctz(SB) gofile..<autogenerated>
  gofile..<autogenerated>:1 0x7ee4d 4883ec18 SUBQ $0x18, SP
  gofile..<autogenerated>:1 0x7ee51 48896c2410 MOVQ BP, 0x10(SP)
  gofile..<autogenerated>:1 0x7ee56 488d6c2410 LEAQ 0x10(SP), BP
  gofile..<autogenerated>:1 0x7ee5b 48890424 MOVQ AX, 0(SP)
  gofile..<autogenerated>:1 0x7ee5f e800000000 CALL 0x7ee64 
[1:5]R_CALL:.../pkg/hybrid8b.Ctz
  gofile..<autogenerated>:1 0x7ee64 450f57ff XORPS X15, X15
  gofile..<autogenerated>:1 0x7ee68 644c8b342500000000 MOVQ FS:0, R14 
[5:9]R_TLS_LE
  gofile..<autogenerated>:1 0x7ee71 0fb6442408 MOVZX 0x8(SP), AX
  gofile..<autogenerated>:1 0x7ee76 488b6c2410 MOVQ 0x10(SP), BP
  gofile..<autogenerated>:1 0x7ee7b 4883c418 ADDQ $0x18, SP
  gofile..<autogenerated>:1 0x7ee7f c3 RET

By comparison, here's what I got by using math/bits.TrailingZeros64:

$ go build -tags noasm -o hybrid8b.o .
$ go tool objdump -s hybrid8b.Ctz hybrid8b.o
warning: GOPATH set to GOROOT (/home/samv/go) has no effect
TEXT .../pkg/hybrid8b.Ctz(SB) 
gofile../home/samv/.../pkg/hybrid8b/primitives_misc.go
  primitives_misc.go:34 0x62130 480fbcc0 BSFQ AX, AX
  primitives_misc.go:34 0x62134 b940000000 MOVL $0x40, CX
  primitives_misc.go:34 0x62139 480f44c1 CMOVE CX, AX
  primitives_misc.go:34 0x6213d c3 RET

I was thinking, that's an interesting thing to be emitted by this source: 

func TrailingZeros64(x uint64) int {
    if x == 0 {
        return 64
    }
    // ...
    return int(deBruijn64tab[(x&-x)*deBruijn64>>(64-6)])
}

As I was thinking, this is presumably some kind of targeted function 
replacement done by the compiler.  This is what I was hoping to do by 
writing the assembly function, but it seems I'm being bitten by all the 
stuff relating to the calling convention, register saving, etc.

I guess my core problem is already solved, but my questions are:

(a) where can I find how this specific optimization is defined?

(b) is it possible to write assembly functions that avoid the wrapper code, 
assuming that one follows the platform's calling convention?

Sam

On Tuesday, 25 April 2023 at 13:48:57 UTC-4 Ian Lance Taylor wrote:

> On Tue, Apr 25, 2023 at 10:03 AM Sam Vilain <s...@vilain.net> wrote:
> >
> > Looks like `math/bits` could use some assembly alternatives, too. Clever 
> as those functions are, they're almost certainly not going to beat the 
> microcode/silicon. I'm looking into the contribution guidelines now!
>
> Note that many of the math/bits functions are actually implemented
> directly by the compiler. Check the final executable on your
> processors of interest.
>
> Ian
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/bae1d3ec-8077-4540-8154-346223063d81n%40googlegroups.com.

Reply via email to