I've read that C++ abandoned CoW for `std::string` because the atomic ref-counting turned out to be more expensive on average than simply copying the string every time. But of course YMMV; the tradeoff depends on the size of the objects and how expensive they are to copy.
And for objects that won't be used concurrently on multiple threads, one can drop down to plain old ints for the refcounts (and skip the fences). That's probably the approach I'll use — I do need threads, but I'm going to try my best to enforce move semantics (a la Rust) for objects being sent between threads.
