I can reproduce similar behavior on linux-amd64:

$ perf stat ./example.com.test -test.bench=BenchmarkInline
-test.benchtime=100000000x
goos: linux
goarch: amd64
pkg: example.com
cpu: Intel(R) Xeon(R) W-2135 CPU @ 3.70GHz
BenchmarkInline-12      100000000               16.78 ns/op

PASS

 Performance counter stats for './example.com.test
-test.bench=BenchmarkInline -test.benchtime=100000000x':

          1,691.95 msec task-clock:u              #    1.004 CPUs utilized

                 0      context-switches:u        #    0.000 /sec

                 0      cpu-migrations:u          #    0.000 /sec

               352      page-faults:u             #  208.044 /sec

     6,732,752,072      cycles:u                  #    3.979 GHz

    22,405,823,428      instructions:u            #    3.33  insn per cycle

     6,501,294,164      branches:u                #    3.842 G/sec

           149,596      branch-misses:u           #    0.00% of all
branches

       1.684677260 seconds time elapsed

       1.692474000 seconds user
       0.004020000 seconds sys



$ perf stat ./example.com.test -test.bench=BenchmarkNoInline
-test.benchtime=100000000x
goos: linux
goarch: amd64
pkg: example.com
cpu: Intel(R) Xeon(R) W-2135 CPU @ 3.70GHz
BenchmarkNoInline-12            100000000               10.79 ns/op
PASS

 Performance counter stats for './example.com.test
-test.bench=BenchmarkNoInline -test.benchtime=100000000x':

          1,091.71 msec task-clock:u              #    1.005 CPUs utilized

                 0      context-switches:u        #    0.000 /sec

                 0      cpu-migrations:u          #    0.000 /sec

               363      page-faults:u             #  332.505 /sec

     4,490,159,750      cycles:u                  #    4.113 GHz

    20,205,764,499      instructions:u            #    4.50  insn per cycle

     6,701,281,015      branches:u                #    6.138 G/sec

           586,073      branch-misses:u           #    0.01% of all
branches

       1.086302272 seconds time elapsed

       1.087710000 seconds user
       0.008027000 seconds sys

The non-inlined version is actually fewer instructions to run the same
benchmark, which surprises me because naively looking at the disassembly it
seems that the inlined version is much more compact.


On Fri, Jul 22, 2022 at 5:52 AM eric...@arm.com <eric.f...@arm.com> wrote:

> For this piece of code, two test functions are the same, but one is
> inlined, the other is not. However the inlined version is about 25% slower
> than the no inlined version on apple m1 chip. Why is it?
>
> The code is here https://go.dev/play/p/0NkLMtTZtv4
>
> --
> You received this message because you are subscribed to the Google Groups
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-nuts+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/golang-nuts/527264d7-7cc1-4278-9a29-c04eb3ec4e86n%40googlegroups.com
> <https://groups.google.com/d/msgid/golang-nuts/527264d7-7cc1-4278-9a29-c04eb3ec4e86n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CALoThU8pAFzz_CGEQ1c4J_tEiLdyeu6kLkkYNjGZKkaLeTgYhw%40mail.gmail.com.

Reply via email to