https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101406
Bug ID: 101406 Summary: shared_ptr in _S_atomic mode still uses __atomic_add_dispatch() Product: gcc Version: 11.1.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: marc.mutz at kdab dot com Target Milestone: --- Consider // https://godbolt.org/z/efTW6MoEh void test_copy(const std::shared_ptr<int> &sp) { auto copy = sp; } vs. // https://godbolt.org/z/3aoGq1f9P void test_copy(const boost::shared_ptr<int> &sp) { auto copy = sp; } In the first cast, over 70 lines of assembler are emitted, in the second, around 30. This seems to be in large part because in _Sp_counted_base::_M_add_ref_copy(), you're using __atomic_add_dispatch() even if _Lp is _S_atomic. It seems to me that a specialisation of this function template for _S_atomic calling just __atomic_add() is missing: https://godbolt.org/z/crPz9hGe7 Probably same for the deref case, too.