https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117655

            Bug ID: 117655
           Summary: std::string::swap() could be much faster and smaller
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libstdc++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: redbeard0531 at gmail dot com
  Target Milestone: ---

TL;DR: it should swap the object (or value) representations unconditionally,
then do (ideally branchless) fixups for strings that were self-referential. It
could generate 20 lines of branchless x64_64 asm assembling to 43 bytes of
instructions, while today it is 426 very branchy lines and 1424 bytes.

The current implementation branches into 6(!) different implementations based
on whether the source and destination are local (ie self-referential) strings
and based on whether those self-referential strings are empty. Furthermore,
when copying the self-referential strings, it uses a variable-sized copy which
becomes a branch tree on the size. I think that none of that branching is
necessary* and is likely to be slower on modern hardware than just
unconditionally swapping all 32 bytes (or less) of the representation, then
repointing the pointers to be self-referential when they point to the other's
buffer.

This shows the codegen for a hastily written prototype of the optimized swap
under the current implementation: https://godbolt.org/z/jrYdWEEsY

* Caveat: I'm only talking about std::string, in particular with the default
allocator. The current implementation (or something similar) may still be
important for non-default allocators, especially where
propagate_on_container_swap is false. However, that shouldn't prevent
optimizing for the common case of using the default allocator.

Reply via email to