https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86008

--- Comment #2 from Matt Whitlock <gcc at mattwhitlock dot name> ---
(In reply to Jonathan Wakely from comment #1)
> (In reply to Matt Whitlock from comment #0)
> > The following shim allows the code above to compile, although it is
> > sub-optimal because it captures a std::basic_string_view by reference.
> 
> Have you profiled to see if that matters?

No, mostly because I find micro-benchmarks to be unreliable indicators of
real-world performance. Instead, I prefer to compare the generated code.

I've made a simplified comparison that boils down the essence of what's going
on in the two cases.

/* BEGIN CODE */

#include <string_view>

struct sv_ref {
        const std::string_view &sv;
};

struct sv_val {
        std::string_view sv;
};

std::ostream & operator << (std::ostream &os, const sv_ref &svr) {
        // os.put('"');
        for (auto c : svr.sv) {
                if (c == '"' || c == '\\') {
                        os.put('\\');
                }
                os.put(c);
        }
        // os.put('"');
        return os;
}

std::ostream & operator << (std::ostream &os, const sv_val &svv) {
        // os.put('"');
        for (auto c : svv.sv) {
                if (c == '"' || c == '\\') {
                        os.put('\\');
                }
                os.put(c);
        }
        // os.put('"');
        return os;
}

/* END CODE */

Plugging this code into Compiler Explorer (godbolt.org) reveals the following
relevant difference:

-operator<<(std::basic_ostream<char, std::char_traits<char> >&, sv_ref const&):
+operator<<(std::basic_ostream<char, std::char_traits<char> >&, sv_val const&):
   push r13
   push r12
   mov r12, rdi
   push rbp
   push rbx
   sub rsp, 8
-  mov rax, QWORD PTR [rsi]     ; rax := &svr.sv
-  mov rbp, QWORD PTR [rax+8]   ; rbp := (&svr.sv)->begin()
-  mov r13, QWORD PTR [rax]     ; r13 := (&svr.sv)->size()
+  mov rbp, QWORD PTR [rsi+8]   ; rbp := svv.sv.begin()
+  mov r13, QWORD PTR [rsi]     ; r13 := svv.sv.size()
   add r13, rbp
   cmp rbp, r13

As expected, the pass-by-reference case requires one additional indirect load
from memory versus the pass-by-value case.

However, this is offset by what happens on the calling side.

/* BEGIN CODE */

#include <iostream>
#include <string_view>

struct sv_ref {
        const std::string_view &sv;
};

struct sv_val {
        std::string_view sv;
};

std::ostream & operator << (std::ostream &os, const sv_ref &svr);
std::ostream & operator << (std::ostream &os, const sv_val &svv);

const std::string_view sv;

void pass_by_ref() {
        std::cout << sv_ref{ sv };
}

void pass_by_val() {
        std::cout << sv_val{ sv };
}

/* END CODE */

Here we see tho following relevant difference:

-pass_by_ref():
+pass_by_val():
   sub rsp, 24
   mov edi, OFFSET FLAT:std::cout
-  mov QWORD PTR [rsp+8], OFFSET FLAT:sv
-  lea rsi, [rsp+8]
-  call operator<<(std::basic_ostream<char, std::char_traits<char> >&, sv_ref
const&)
+  mov rsi, rsp
+  mov QWORD PTR [rsp], 0
+  mov QWORD PTR [rsp+8], 0
+  call operator<<(std::basic_ostream<char, std::char_traits<char> >&, sv_val
const&)
   add rsp, 24
   ret

The pass-by-value case requires one additional indirect store to memory versus
the pass-by-reference case.

So it's a wash. And, in fact, when I combine the two above snippets of source
code into one compilation unit so that the optimizer can work its magic, the
only differences in the generated code between the two cases are the labels.
That's what I would expect too.

So you're right: although the pass-by-reference case is conceptually
sub-optimal, given the typical advice that std::string_view objects should be
passed by value, in practice it is identical.

Reply via email to