Thanks! I'd seen the "dead code elimination" comment somewhere and questioned it, but not enough.
If I worry about what some future Go compiler might optimize, I end up worrying quite a lot! For example, could this code: func BenchmarkPopCountAlive(b *testing.B) { sum = 0 for i := 0; i < b.N; i++ { sum += PopCount(0x1234567890abcdef) } } hypothetically be optimized to: func BenchmarkPopCountAlive(b *testing.B) { sum = PopCount(0x1234567890abcdef) * b.N } since PopCount() always returns the same value for the same argument? It probably wouldn't, since that would break many existing benchmarks. Maybe more reasonably worrisome: If sum is initialized and incremented but never read, could all references to it be optimized away? That's easy enough to avoid, as is the optimization I worried about above: var sum = 0 // Avoid dead code elimination const firstInput uint64 = 0x1234567890abcdef func BenchmarkPopCountSimple(b *testing.B) { for i, input := 0, firstInput; i < b.N; i++ { sum += PopCount(input) input++ } if sum == 0 { b.Error("BenchmarkPopCountSimple: sum == 0") } } Here are my new benchmark results: $ go version go version go1.16.5 windows/amd64 $ go test -cpu=1 -bench=. goos: windows goarch: amd64 pkg: gopl.io/popcount cpu: Intel(R) Core(TM) i5 CPU 760 @ 2.80GHz BenchmarkPopCountSimple 271004790 4.419 ns/op BenchmarkPopCountSlowSimple 278 4235890 ns/op BenchmarkPopCountAlmostSimple/BenchmarkPopCountAlmostSimple 272759134 4.434 ns/op BenchmarkPopCountNearlySimple/PopCountNearlySimple 145613077 8.172 ns/op PASS ok gopl.io/popcount 7.061s The anomalistically-low "simple" and "almost simple" results are now more reasonable, but still nearly a factor of two lower than the "nearly simple" results. They're still inlining the calls, as is the "slow simple" benchmark, where the "nearly simple" one isn't: $ go test -cpu=1 -bench=. -gcflags=-m 2>&1 | egrep 'inlining call to PopCount' .\popcount_test.go:10:18: inlining call to PopCount .\popcount_test.go:22:19: inlining call to PopCount .\popcount_test.go:34:19: inlining call to PopCount Overkill, right? --PSRC P.S.: For better or worse, I discovered I can -- but shouldn't, based on the feedback I got! -- get "fancy" formatting by copying from vscode and pasting into Gmail. On Tuesday, June 1, 2021 at 12:01:51 PM UTC-4 peterGo wrote: > Here is a more definitive reply than my original reply. > > I got as far as this > > func BenchmarkPopCountSimple(b *testing.B) { > sum := 0 // Avoid dead code elimination. > for i := 0; i < b.N; i++ { > sum += PopCount(0x1234567890abcdef) > } > } > > As you can see from the objdump, BenchmarkPopCountSimple dead code is > eliminated. > > func BenchmarkPopCountSimple(b *testing.B) { > 0x4e38c0 31c9 XORL CX, CX > > for i := 0; i < b.N; i++ { > 0x4e38c2 eb03 JMP 0x4e38c7 > 0x4e38c4 48ffc1 INCQ CX > 0x4e38c7 48398890010000 CMPQ CX, 0x190(AX) > 0x4e38ce 7ff4 JG 0x4e38c4 > } > 0x4e38d0 c3 RET > > I added an additional BenchmarkPopCountAlive benchmark. > > var sum = 0 // Avoid dead code elimination. > > func BenchmarkPopCountAlive(b *testing.B) { > sum = 0 > > for i := 0; i < b.N; i++ { > sum += PopCount(0x1234567890abcdef) > } > } > > As you can see from the objdump, BenchmarkPopCountAlive code is not > eliminated. > > For details, see the popcount.txt attachment. > > Peter > > On Monday, May 31, 2021 at 6:29:27 PM UTC-4 Paul S. R. Chisholm wrote: > >> This is not a serious problem, but it surprised me. (By the way, how can >> I post a message with code formatting?) >> >> I'd like to create table-driven benchmarks: >> https://blog.golang.org/subtests. >> >> To that end, I started with code from *The Go Programming Language*: >> >> // PopCount is based on an example from chapter 2 of THE GO PROGRAMMING >> LANGUAGE. >> // Copyright © 2016 Alan A. A. Donovan & Brian W. Kernighan. >> // License: https://creativecommons.org/licenses/by-nc-sa/4.0/ >> >> package popcount >> >> // pc[i] is the population count of i. >> var pc [256]byte >> >> func init() { >> for i := range pc { >> pc[i] = pc[i/2] + byte(i&1) >> } >> } >> >> // PopCount returns the population count (number of set bits) of x. >> func PopCount(x uint64) int { >> return int(pc[byte(x>>(0*8))] + >> pc[byte(x>>(1*8))] + >> pc[byte(x>>(2*8))] + >> pc[byte(x>>(3*8))] + >> pc[byte(x>>(4*8))] + >> pc[byte(x>>(5*8))] + >> pc[byte(x>>(6*8))] + >> pc[byte(x>>(7*8))]) >> } >> >> I then wrote a simple test: >> >> // popcount_simple_test.go >> package popcount >> >> import "testing" >> >> func BenchmarkPopCountSimple(b *testing.B) { >> sum := 0 // Avoid dead code elimination. >> for i := 0; i < b.N; i++ { >> sum += PopCount(0x1234567890abcdef) >> } >> } >> >> Because I was paranoid about dead code elimination, I wrote an identical >> test that calls the function many times (inside the loop to call it b.N >> times, which should make no difference if the hypothetically-still-dead >> code is being eliminated): >> >> // popcount_slow_simple_test.go >> package popcount >> >> import "testing" >> >> func BenchmarkPopCountSlowSimple(b *testing.B) { >> sum := 0 // Avoid dead code elimination. >> for i := 0; i < b.N; i++ { >> // Exactly as before, but call the function many times. >> for j:= 0; j < 1_000_000; j++ { >> sum += PopCount(0x1234567890abcdef) >> } >> } >> } >> >> Then I wrote an "almost simple" test that uses testing.B.Run() but is >> hardwired to call the function being benchmarked: >> >> // popcount_almost_simple_test.go >> package popcount >> >> import "testing" >> >> func BenchmarkPopCountAlmostSimple(b *testing.B) { >> b.Run("BenchmarkPopCountAlmostSimple", func(b *testing.B) { >> sum := 0 // Avoid dead code elimination. >> for i := 0; i < b.N; i++ { >> sum += PopCount(0x1234567890abcdef) >> } >> }) >> } >> >> Finally, I wrote a "nearly-simple" test that passes arguments to Run() >> that could have come from a table: >> >> // popcount_nearly_simple.go >> package popcount >> >> import "testing" >> >> func BenchmarkPopCountNearlySimple(b *testing.B) { >> f := PopCount >> name := "PopCountNearlySimple" >> b.Run(name, func(b *testing.B) { >> sum := 0 // Avoid dead code elimination. >> for i := 0; i < b.N; i++ { >> sum += f(0x1234567890abcdef) >> } >> }) >> } >> >> The simple and almost-simple results are nearly identical (happily, the >> slow results are not), but the nearly-simple results are an order of >> magnitude slower: >> >> $ go version >> go version go1.16.4 windows/amd64 >> $ go test -cpu=1 -bench=. >> goos: windows >> goarch: amd64 >> pkg: gopl.io/popcount >> cpu: Intel(R) Core(TM) i5 CPU 760 @ 2.80GHz >> BenchmarkPopCountAlmostSimple/BenchmarkPopCountAlmostSimple >> 1000000000 0.6102 ns/op >> BenchmarkPopCountNearlySimple/PopCountNearlySimple >> 194925662 6.197 ns/op >> BenchmarkPopCountSimple >> 1000000000 0.5994 ns/op >> BenchmarkPopCountSlowSimple >> 1953 606194 ns/op >> PASS >> ok gopl.io/popcount 4.534s >> >> After reading this article: >> >> >> https://medium.com/a-journey-with-go/go-inlining-strategy-limitation-6b6d7fc3b1be >> >> I ran: >> >> $ go test -cpu=1 -bench=. -gcflags=-m 2>&1 | egrep 'inlining call to >> PopCount' >> .\popcount_almost_simple_test.go:9:19: inlining call to PopCount >> .\popcount_simple_test.go:8:18: inlining call to PopCount >> .\popcount_slow_simple_test.go:10:19: inlining call to PopCount >> >> That makes sense. 0.6 ns is less than a function call. The simple and >> almost-simple benchmarks can inline the call to the hardwired function, but >> the nearly-simple benchmark can't. >> >> In practice, this isn't much of a problem. Any function small enough to >> be inlined is unlikely to be a performance bottleneck. If it ever is, a >> non-table-driven benchmark can still measure it. >> >> Hope this helps. --PSRC >> >> P.S.: Out of curiosity, how can I post a message with fancy code examples >> like this one? >> https://groups.google.com/g/golang-nuts/c/5DgtH2Alt_I/m/hlsqdRSGAgAJ >> > -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/CAG%3D3cjnebauo-VDiX%2BQjPVnnSji1DA0DtxsTKkOcTW9eVfTN8g%40mail.gmail.com.
popcount_test.go
Description: Binary data