sahib commented on PR #326:
URL: https://github.com/apache/arrow-go/pull/326#issuecomment-2778508876
@zeroshade More a minor thing I also noticed and might be worth sharing:
Most of the asm functions are selected over function variables during
runtime (i.e. during `init()` different implementation are set based on cpu
functions). Overall a nice approach, but it comes with a small performance
penalty, as this prohibits inlining (and maybe some more things?).
Considering this mini benchmark:
```go
func BenchmarkTestGreaterThanBitmap(b *testing.B) {
const N = 10
levels := make([]int16, N)
for idx := range levels {
levels[idx] = int16(idx)
}
b.Run("func", func(b *testing.B) {
for b.Loop() {
GreaterThanBitmap(levels, int16(N/2))
}
})
b.Run("no-func-go", func(b *testing.B) {
for b.Loop() {
greaterThanBitmapGo(levels, int16(N/2))
}
})
}
```
```sh
# noasm to make sure that we do not compare against arch specific function
$ go test -tags noasm -bench=. -run=xxx
BenchmarkTestGreaterThanBitmap/func-16 165481227
7.289 ns/op
BenchmarkTestGreaterThanBitmap/no-func-go-16 243919600
4.611 ns/op
```
That difference of course is negligible if the function runtime increases.
But overall it would be probably possible to squeeze out a few percent of
benchmark speed when changing those function values to something along the
lines:
```go
func ExtractBits(...) {
if cpu.X86.HasBMI2 {
return extractBitsGo(...)
}
return extractBitsBMI()
}
```
I did push a dummy branch here:
https://github.com/sahib/arrow-go/tree/bench/build-tag - seems like the whole
platform selection gets less convoluted using this approach as well. All in all
more minor stuff, but I wanted your opinion on this first.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]