https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117655
Bug ID: 117655 Summary: std::string::swap() could be much faster and smaller Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: redbeard0531 at gmail dot com Target Milestone: --- TL;DR: it should swap the object (or value) representations unconditionally, then do (ideally branchless) fixups for strings that were self-referential. It could generate 20 lines of branchless x64_64 asm assembling to 43 bytes of instructions, while today it is 426 very branchy lines and 1424 bytes. The current implementation branches into 6(!) different implementations based on whether the source and destination are local (ie self-referential) strings and based on whether those self-referential strings are empty. Furthermore, when copying the self-referential strings, it uses a variable-sized copy which becomes a branch tree on the size. I think that none of that branching is necessary* and is likely to be slower on modern hardware than just unconditionally swapping all 32 bytes (or less) of the representation, then repointing the pointers to be self-referential when they point to the other's buffer. This shows the codegen for a hastily written prototype of the optimized swap under the current implementation: https://godbolt.org/z/jrYdWEEsY * Caveat: I'm only talking about std::string, in particular with the default allocator. The current implementation (or something similar) may still be important for non-default allocators, especially where propagate_on_container_swap is false. However, that shouldn't prevent optimizing for the common case of using the default allocator.