On 2017-03-29 16:41, Matthew Woehlke wrote:
On 2017-03-29 07:26, Marc Mutz wrote:
That brings us straight back to the fundamental question: Why can the C++ world at large cope with containers that are not CoW and Qt cannot? The only answer I have is "because Qt never tried". And that's the end of it. I have pointed to Herb's string measurements from a decade or two ago. I have shown that copying a std::vector up to 1K ints is faster than a QVector, when
hammered by at least two threads.

4 KiB of memory is not very much. What happens if you have larger
objects (say, 100 objects with 96 bytes each)?

The same. QVector has a hw mutex around the ref counting. Only one core can have write access to any given cache line. So the rate with which you can update the ref count is limited by the rate a single core can update it (in memory), divided by a factor that accounts for cache-line ping-pong. It can be as high as 2.

Deep-copying does not write to the source object, and any number of cores can share read access for a given cache line, each with its own copy, so deep-copying scales linearly with the number of cores.

Therefore, for any given element size and count there exists a thread count where deep-copying becomes faster than CoW. Yes, even for 1K objects of 1K size each.

What if you have an API that needs value semantics (keep in mind one
benefit of CoW is implicit shared lifetime management) but tend to not
actually modify the "copied" list?

std::vector has value semantics. OTOH, QVector's CoW leaks its reference semantics, e.g. if you take an iterator into a container, copy the container, then write to the iterator, you wrote to both copies.

What benchmarks have been done on *real applications*? What were the
results?

What benchmarks have *you* done? The world outside Qt is happily working with CoWless containers. It's proponents of CoW who need to show that CoW is a global optimisation and not just for copying of certain element counts and sizes.

(I just had to review _another_ pimpl'ed class that contained
nothing but two enums)

...and what happens if at some point in the future that class needs
three enums? Or some other member?

When you start with the class, you pack the two values into a bit-field and add reserved space to a certain size. 4 or 8 bytes. When you run out, you make a V2 and add an overload taking V2. That is perfectly ok, since old code can't use new API. This doesn't mean you should never use pimpl. But it means you shouldn't use it just because you can.

What, exactly, do you find objectionable about PIMPL in "modern C++"? It
can't be that it's inefficient, because performance was never a goal of
PIMPL.

Performance is always a goal in C++. Even in Qt. Otherwise QRect would be pimpled, too.

so I can't pass it by value into slots.

Why would you want to? No-one does that. People use cref, like for all large types. Qt makes sure that a copy is taken only when needed, ie. when the slot is in a different thread from the emitter. That is very rare, and people can
be expected to pass a shared_ptr<vector> instead in these situations.

This (passing lists across thread boundaries in signals/slots) happens
quite a bit in https://github.com/kitware/vivia/. Doing so is a
fundamental part of the data processing architecture of at least two of
the applications there.

Qt supports thousands of applications. We shouldn't optimize for corner-cases.

Also, explicit sharing borders on premature pessimization. If my slot
needs to modify the data, I have to go out of my way to avoid making an
unnecessary copy. (This argument would be more compelling if C++ had a
cow_ptr.)

You got that the wrong way around.

Thanks,
Marc

_______________________________________________
Development mailing list
[email protected]
http://lists.qt-project.org/mailman/listinfo/development

Reply via email to