On 4/25/20 4:49 PM, André Pönitz wrote:
Spam detection software, running on the system "mx.qt-project.org",
has identified this incoming email as possible spam.  The original

[*SIGHS*]


We all know the story that began with

    "We knew for a long time that QList is not a good default
    container, despite what the documentation claims. The problem
    boils down to the fact that for a lot of types T, QList<T> is
    needlessly inefficient by allocating elements on the heap and
    storing pointers to them instead of storing the elements
    in-place, like e.g. QVector does. Sometimes, for large and
    complex types, that might be exactly what you want, but a
    conservative estimate would put that chance at less than 5%." [1]


I was curious how this "conservative" estimate of "less than 5%" manifests in real QList object instances in a real world example.

My random picks of real world examples usually end up with "Loading
the Qt Creator project in Qt Creator itself", and "statistics" is
something I make up myself, in this case it is looking at QList objects
at destruction time, and sorting them a bit according to size of
the items in the list, number of items still present at that time etc.

Besides being simple to do it's rather close to the typical use I see,
where containers are filled up, accessed/used a few times and destroyed
without much intermediate size changes.


Here is what I get:

* total QList objects created : 51.631.363 - destructors that only bump ref : 38.015.673 73.6% - destructors that actually dealloc : 13.615.690 26.4%

The 13.615.690 instances with actual deallocations ordered by the size of the item type:

    size     occurences

     4         1.656        #  0.01 %   with internal padding
8 13.424.228 # 98.59 % objects with ideal size,
     12             3    \
     16        74.979     |
     24        60.560     |
     28           126     |
     32        22.358     |
     40         2.054     |
     48        18.786     |
     56            71     >  # 1.40 %  with indirection
     64             3     |
     72            46     |
     80         3.484     |
96 1 | 112 7.264 | 128 52 | 184 2 |
     2112          17    /


From the 8-byte objects we have
13.420.883 stored directly # 99.975 % of cases
             3.345  stored indirectly    #   0.025 %

i.e. in 98.57% of all cases, QList objects behave optimally.

Before we enter confirmation bias here: could you also offer a breakdown of the actual types involved? Basically, there's some important things to discuss here:

1) First and foremost, in Qt 6 QString, QByteArray, QVector, are bigger than a pointer (3 times a pointer size). So, in Qt 6:

* either QList stays unchanged, and now we heap allocate each element for those cases too (thus it's necessary to know how the above statistics change); or

* QList gets adapted so that its internal array allocates 3 * sizeof(void*) per element, so that e.g. Q6StringList won't require a per-item allocation. This would then also avoid allocations for other datatypes, e.g. QStringView, QImage, maybe QVariant and QColor; but of course waste a ton of space for the ones which remain small (most of Qt implicitly shared datatypes).


2) Please re-run the count also for the QVector instances in Creator. Under the "typical use" defined above, they could be converted to QLists, couldn't they? How much would the totals change?


3) How many datatypes are defined by Creator (so they could be considered "user defined", for the purposes of this exercise; although I believe Creator also employs pimpl quite aggressively to keep BC, or am I wrong here?), and how many are coming from Qt?

The confirmation bias comes from the fact that Qt datatypes are designed to work nicely with QList; this for at least two good reasons:

a) Qt value types are normally pimpled (because of refcounting, or ABI stability) so they fit exactly in QList's array slots, but user-defined types aren't necessarily pimpled. Thus the bet: they're almost always the wrong size for QList.

b) We've spent an awful amount of time reviewing Qt's source code and tagging down every value type as movable (where it made sense). And I'd like to extend a big thank you to Sérgio for clazy, Marc for tirelessly fixing the code, Thiago for giving us relocatable types before Qt 6 in QVector-but-not-QList-as-that's-BIC. Users simply forget the dreaded typeinfo macro and so, for their types, QList is always in array-of-pointers mode no matter the size of the datatype.


This is the area I fundamentally consider the "original sin" more than anything else: QList can be a terrible default choice for end users and their datatypes, while it's possibly still a good one for Qt and its datatypes.

So what, we use it, then users can choose whatever they want? Sounds nice in theory, but in practice the choice of QList percolates down into user code -- at least up to the "UI layer", if not even further down. Qt examples themselves contain code using QList over "wrong" user types!

(And I didn't even start explore the teachability "fun" coming with this.)

In conclusion, if QList is a good choice for Qt's own types, and even if we then expose it at the API level as the Qt container of choice, its only advantage then over QVector is going to be prepending? How often is that a use case to justify the semantic burden of this extra container?


This high percentage differs from the artificial benchmark scenarios not
by accident. Almost all of these lists are lists of pointers, a natural
ingredient of tree-like structures, common in user interfaces.

Lists of _actual_ pointers, or lists of refcounted types? That's why I thoughts types were interesting to know. If it's the latter, unless these types are changing in Qt 6 (like QString), the only difference here w.r.t. using QVector would be prepending?


Overall there are 333 different item types, grouped by size of type there is:

          7    <  8    #  2.1 %  of types
        122   ==  8    # 36.6 %  of types
        204    >  8    # 61.3 %  of types



What I see here is a general-purpose random-access container with cheaper
insertion and deletion at front and in the middle than *vector provides for
61.3% of the types,

This cannot be claimed as a closed result: for insertion, it's ignoring the cost of the individual allocation of the newly inserted item, that needs to be traded off the moving of more bytes in memory.


Thanks for the scientific approach,
--
Giuseppe D'Angelo | [email protected] | Senior Software Engineer
KDAB (France) S.A.S., a KDAB Group company
Tel. France +33 (0)4 90 84 08 53, http://www.kdab.com
KDAB - The Qt, C++ and OpenGL Experts

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
Development mailing list
[email protected]
https://lists.qt-project.org/listinfo/development

Reply via email to