[go-nuts] Re: Table-driven benchmarks defeat inlining

Paul S. R. Chisholm Sun, 06 Jun 2021 05:34:46 -0700

Thanks! I'd seen the "dead code elimination" comment somewhere and
questioned it, but not enough.


If I worry about what some future Go compiler might optimize, I end up
worrying quite a lot! For example, could this code:

func BenchmarkPopCountAlive(b *testing.B) {
    sum = 0
    for i := 0; i < b.N; i++ {
        sum += PopCount(0x1234567890abcdef)
    }
}

hypothetically be optimized to:

func BenchmarkPopCountAlive(b *testing.B) {
    sum = PopCount(0x1234567890abcdef) * b.N
}

since PopCount() always returns the same value for the same argument? It
probably wouldn't, since that would break many existing benchmarks.

Maybe more reasonably worrisome: If sum is initialized and incremented but
never read, could all references to it be optimized away? That's easy
enough to avoid, as is the optimization I worried about above:

var sum = 0 // Avoid dead code elimination
const firstInput uint64 = 0x1234567890abcdef

func BenchmarkPopCountSimple(b *testing.B) {
    for i, input := 0, firstInput; i < b.N; i++ {
        sum += PopCount(input)
        input++
    }
    if sum == 0 {
        b.Error("BenchmarkPopCountSimple: sum == 0")
    }
}

Here are my new benchmark results:

$ go version
go version go1.16.5 windows/amd64
$ go test -cpu=1 -bench=.
goos: windows
goarch: amd64
pkg: gopl.io/popcount
cpu: Intel(R) Core(TM) i5 CPU         760  @ 2.80GHz
BenchmarkPopCountSimple         271004790                4.419 ns/op
BenchmarkPopCountSlowSimple          278           4235890 ns/op
BenchmarkPopCountAlmostSimple/BenchmarkPopCountAlmostSimple
272759134                4.434 ns/op
BenchmarkPopCountNearlySimple/PopCountNearlySimple
 145613077                8.172 ns/op
PASS
ok      gopl.io/popcount        7.061s

The anomalistically-low "simple" and "almost simple" results are now more
reasonable, but still nearly a factor of two lower than the "nearly simple"
results. They're still inlining the calls, as is the "slow simple"
benchmark, where the "nearly simple" one isn't:

$ go test -cpu=1 -bench=. -gcflags=-m 2>&1 | egrep 'inlining call to
PopCount'
.\popcount_test.go:10:18: inlining call to PopCount
.\popcount_test.go:22:19: inlining call to PopCount
.\popcount_test.go:34:19: inlining call to PopCount

Overkill, right? --PSRC

P.S.: For better or worse, I discovered I can -- but shouldn't, based on
the feedback I got! -- get "fancy" formatting by copying from vscode and
pasting into Gmail.


On Tuesday, June 1, 2021 at 12:01:51 PM UTC-4 peterGo wrote:

> Here is a more definitive reply than my original reply.
>
> I got as far as this
>
> func BenchmarkPopCountSimple(b *testing.B) {
>     sum := 0 // Avoid dead code elimination.
>     for i := 0; i < b.N; i++ {
>         sum += PopCount(0x1234567890abcdef)
>     }
> }
>
> As you can see from the objdump, BenchmarkPopCountSimple dead code is
> eliminated.
>
> func BenchmarkPopCountSimple(b *testing.B) {
>   0x4e38c0        31c9            XORL CX, CX
>
>     for i := 0; i < b.N; i++ {
>   0x4e38c2        eb03            JMP 0x4e38c7
>   0x4e38c4        48ffc1            INCQ CX
>   0x4e38c7        48398890010000        CMPQ CX, 0x190(AX)
>   0x4e38ce        7ff4            JG 0x4e38c4
> }
>   0x4e38d0        c3            RET
>
> I added an additional BenchmarkPopCountAlive benchmark.
>
> var sum = 0 // Avoid dead code elimination.
>
> func BenchmarkPopCountAlive(b *testing.B) {
>     sum = 0
>
>     for i := 0; i < b.N; i++ {
>         sum += PopCount(0x1234567890abcdef)
>     }
> }
>
> As you can see from the objdump, BenchmarkPopCountAlive code is not
> eliminated.
>
> For details, see the popcount.txt attachment.
>
> Peter
>
> On Monday, May 31, 2021 at 6:29:27 PM UTC-4 Paul S. R. Chisholm wrote:
>
>> This is not a serious problem, but it surprised me. (By the way, how can
>> I post a message with code formatting?)
>>
>> I'd like to create table-driven benchmarks:
>>     https://blog.golang.org/subtests.
>>
>> To that end, I started with code from *The Go Programming Language*:
>>
>> // PopCount is based on an example from chapter 2 of THE GO PROGRAMMING
>> LANGUAGE.
>> // Copyright © 2016 Alan A. A. Donovan & Brian W. Kernighan.
>> // License: https://creativecommons.org/licenses/by-nc-sa/4.0/
>>
>> package popcount
>>
>> // pc[i] is the population count of i.
>> var pc [256]byte
>>
>> func init() {
>> for i := range pc {
>> pc[i] = pc[i/2] + byte(i&1)
>> }
>> }
>>
>> // PopCount returns the population count (number of set bits) of x.
>> func PopCount(x uint64) int {
>> return int(pc[byte(x>>(0*8))] +
>> pc[byte(x>>(1*8))] +
>> pc[byte(x>>(2*8))] +
>> pc[byte(x>>(3*8))] +
>> pc[byte(x>>(4*8))] +
>> pc[byte(x>>(5*8))] +
>> pc[byte(x>>(6*8))] +
>> pc[byte(x>>(7*8))])
>> }
>>
>> I then wrote a simple test:
>>
>> // popcount_simple_test.go
>> package popcount
>>
>> import "testing"
>>
>> func BenchmarkPopCountSimple(b *testing.B) {
>> sum := 0 // Avoid dead code elimination.
>> for i := 0; i < b.N; i++ {
>> sum += PopCount(0x1234567890abcdef)
>> }
>> }
>>
>> Because I was paranoid about dead code elimination, I wrote an identical
>> test that calls the function many times (inside the loop to call it b.N
>> times, which should make no difference if the hypothetically-still-dead
>> code is being eliminated):
>>
>> // popcount_slow_simple_test.go
>> package popcount
>>
>> import "testing"
>>
>> func BenchmarkPopCountSlowSimple(b *testing.B) {
>> sum := 0 // Avoid dead code elimination.
>> for i := 0; i < b.N; i++ {
>> // Exactly as before, but call the function many times.
>> for j:= 0; j < 1_000_000; j++ {
>> sum += PopCount(0x1234567890abcdef)
>> }
>> }
>> }
>>
>> Then I wrote an "almost simple" test that uses testing.B.Run() but is
>> hardwired to call the function being benchmarked:
>>
>> // popcount_almost_simple_test.go
>> package popcount
>>
>> import "testing"
>>
>> func BenchmarkPopCountAlmostSimple(b *testing.B) {
>> b.Run("BenchmarkPopCountAlmostSimple", func(b *testing.B) {
>> sum := 0 // Avoid dead code elimination.
>> for i := 0; i < b.N; i++ {
>> sum += PopCount(0x1234567890abcdef)
>> }
>> })
>> }
>>
>> Finally, I wrote a "nearly-simple" test that passes arguments to Run()
>> that could have come from a table:
>>
>> // popcount_nearly_simple.go
>> package popcount
>>
>> import "testing"
>>
>> func BenchmarkPopCountNearlySimple(b *testing.B) {
>> f := PopCount
>> name := "PopCountNearlySimple"
>> b.Run(name, func(b *testing.B) {
>> sum := 0 // Avoid dead code elimination.
>> for i := 0; i < b.N; i++ {
>> sum += f(0x1234567890abcdef)
>> }
>> })
>> }
>>
>> The simple and almost-simple results are nearly identical (happily, the
>> slow results are not), but the nearly-simple results are an order of
>> magnitude slower:
>>
>> $ go version
>> go version go1.16.4 windows/amd64
>> $ go test -cpu=1 -bench=.
>> goos: windows
>> goarch: amd64
>> pkg: gopl.io/popcount
>> cpu: Intel(R) Core(TM) i5 CPU         760  @ 2.80GHz
>> BenchmarkPopCountAlmostSimple/BenchmarkPopCountAlmostSimple
>>  1000000000               0.6102 ns/op
>> BenchmarkPopCountNearlySimple/PopCountNearlySimple
>> 194925662                6.197 ns/op
>> BenchmarkPopCountSimple
>>  1000000000               0.5994 ns/op
>> BenchmarkPopCountSlowSimple
>>    1953            606194 ns/op
>> PASS
>> ok      gopl.io/popcount        4.534s
>>
>> After reading this article:
>>
>>
>> https://medium.com/a-journey-with-go/go-inlining-strategy-limitation-6b6d7fc3b1be
>>
>> I ran:
>>
>> $ go test -cpu=1 -bench=. -gcflags=-m 2>&1 | egrep 'inlining call to
>> PopCount'
>> .\popcount_almost_simple_test.go:9:19: inlining call to PopCount
>> .\popcount_simple_test.go:8:18: inlining call to PopCount
>> .\popcount_slow_simple_test.go:10:19: inlining call to PopCount
>>
>> That makes sense. 0.6 ns is less than a function call. The simple and
>> almost-simple benchmarks can inline the call to the hardwired function, but
>> the nearly-simple benchmark can't.
>>
>> In practice, this isn't much of a problem. Any function small enough to
>> be inlined is unlikely to be a performance bottleneck. If it ever is, a
>> non-table-driven benchmark can still measure it.
>>
>> Hope this helps. --PSRC
>>
>> P.S.: Out of curiosity, how can I post a message with fancy code examples
>> like this one?
>>     https://groups.google.com/g/golang-nuts/c/5DgtH2Alt_I/m/hlsqdRSGAgAJ
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CAG%3D3cjnebauo-VDiX%2BQjPVnnSji1DA0DtxsTKkOcTW9eVfTN8g%40mail.gmail.com.

popcount_test.go
Description: Binary data

[go-nuts] Re: Table-driven benchmarks defeat inlining

Reply via email to