Hi,

I went to the drawing board and drew up a variant string view class. It's here: https://codereview.qt-project.org/c/qt/qtbase/+/301594

Here's why I think we need it. At the end of the email, I also suggest how we should go about introducing it into Qt.

Thiago and Lars are meanwhile convinced that we need a QUtf8tringView, too. Lars sees some merit for low-level APIs, Thiago remains unconvinced.

I have come to believe that QUtf8StringView without QAnyStringView won't fly: Introducing QUtf8StringView without QAnyStringView will explode the number of mixed-type operations we need to support. If we don't remove anything, we're talking about

- QString
- QStringRef*
- QStringView
- QByteArray
- QByteArrayView
- QUtf8StringView
- QLatin1String
- char16_t
- QChar
- char8_t
- char
- QLatin1Char
- const char*
- const char16_t*
- const char8_t*

and anything I've forgotten. The best we can do to condense this down is to revoke string-ness of QByteArray and we'd be left with

- QStringView
- QLatin1String
- QUtf8StringView
- QChar

the latter would have to accept plain char again, something we ASCII_DEPRECATED years ago, but should be re-considered under the new src-is-UTF-8 paradigm.

Lars would probably say that we could also drop QLatin1String, which which I disagree[1].

Assuming for the sake of argument that we need those four types, consider QString::replace(). Experience shows that stuff like QStringBuilder expressions being passed will require an actual QString overload to be present, too. Ignoring existing overloads and regexp, we'd need 5x5=25 overloads. I won't enumerate them here. What I will enumerate is the complete set of overloads when using QAnyStringView:

QString& QString::replace(QAnyStringView, QAnyStringView, Qt::CaseSensitivity);

That's it.

Unlike QStringView, QAnyStringView is a pure interface type. I won't add much in the way of parsing API to it, even though I acknowledge that's a slippery slope. While it would be easy to add trimmed(), and tokenize() would be really interesting, QAnyStringView should not be used for parsing. That's what we have the three non-variant string view types for. Being a pure interface type means we can add more "dangerous" conversions. QStringView can't be constructed from a QStringBuilder, e.g., because it's almost impossible to make that work without referencing destroyed data:

   QStringView s = u'c' + QString::number(x); // oops
   QString c = u'c' + QString::number(x);
   QStringView s = c; // ok

But QAnyStringView supports this:

   str.replace(name, name % "_1");

In summary: 25 overloads is just way too much (and don't forget regex, which adds another five).

The replace() problem is also present with relational operators and basically wherever we have two QString arguments right now.

QAnyStringView solves this in the sense that one overload can replace many overloads. The complexity is still there, a binary visitation of a QAnyStringView produces nine instantiations of the visitor (though that can be reduced to six in many cases), but many implementations fall into one of just two classes: 1) a function would just call toString() on the any-string-view, anyway, in which case the QString construction is taken out of user code and centralized in the library. If you think that doesn't matter, look at the tst_qstatemachine numbers in

https://codereview.qt-project.org/c/qt/qtbase/+/301595 (-10KiB just from temporary QString creation and destruction)

2) the complexity is already there and QAnyStringView helps in reducing it:

  https://codereview.qt-project.org/c/qt/qtbase/+/303483 (QCalendar)
  https://codereview.qt-project.org/c/qt/qtbase/+/303512 (QColor)
  https://codereview.qt-project.org/c/qt/qtbase/+/303707 (arg())
  https://codereview.qt-project.org/c/qt/qtbase/+/303708 (QUuid)


Another aspect that I'd like to mention is how QAnyStringView also helps with getting rid of QLatin1String for Qt 7: Instead of having QL1S strewn around the Qt API as we have now, we'd have just the QAnyStringView(QLatin1String) ctor that we'd need to deprecate.

Finally, of course, QAnyStringView increases integration of Qt with other C++ libraries, because it now transparently accepts almost any string type that exists out there (thanks to Peppe's Magic QStringView ctor that QUtf8tringView and QAnyStringView inherit).

I was very sceptical when some months ago someone on this ML suggested to make QString hold either UTF8 or UTF16 data, and I still am, but in an explicit variant string view type, this concept suddenly makes a lot of sense.


Now that I hopefully have convinced you that we need QAnyStringView, where to go from here?

Given the lack of time until Qt 6.0, I'd like to propose to just replace all overload sets that contain QL1S with one overload taking QAnyStringView

The implementation usually contains the optimized handling of L1 data already, and can often be easily extended to UTF-8, too, cf. QColor, QUuid, arg().

This should really happen for Qt 6, because it will greatly clean up our lower-level APIs and tell a consistent story.

On top of that, we can also think of replacing overloads sets that contain QString and (QStringView or QStringRef) with one overload taking QAnyStringView, or QString functions that typically get passed constants (like setObjectName()), but I agree with Lars that there's not enough time and man-power to bring this to a conclusion for Qt 6.

Thanks,
Marc

[1] First, we have a lot of existing QLatin1String use in code, both in Qt itself, as well as in code that has seen e.g. Clazy. Users of QLatin1String know why they use this class - it's either to silence QT_NO_CAST_FROM_ASCII or because there's a QLatin1String overload that they call and that prevents a QString creation. Either way, these developers will not react kindly if a recommended-in-Qt-5 solution suddenly gets either removed or heavily pessimized in Qt 6.

Second, UTF-8 is a multi-byte encoding, like UTF-16. Unlike L1 -> UTF-16, however, the number of code points needed to represent L1 in U8 is not constant. That means that important optimisations like

bool operator==(QLatin1String lhs, QStringView rhs) { return lhs.size() == rhs.size() && ~~~~; }

no longer work:

bool operator==(QUtf8StringView lhs, QStringView rhs) { return lhs.size() == rhs.size() // NOPE!

If you think this doesn't matter, think again: it's the reason why in C++20 the original design of <=> was changed to only synthesize <, >, <=, >= and no longer also ==, !=. If you still don't believe, look at some if-else-chain that probably already exists somewhere (uic comes to mind):

    if (name == QLatin1String("name")) {
        ~~~~
    } else if (name == QLatin1String("type")) {
        ~~~~ 50 other tokens ~~~~
    } else {
        // error
    }

all of these start with a size check whereas

    if (name == "widget") {
        ~~~~
    } else if (name == "type") {
        ~~~~ 50 other tokens ~~~~
    } else {
        // error
    }

cannot. They immediately go into the strcmp loop. Now imagine there's a rather common prefix to all these tags...

    if (name == "qt_impl_widget") {
        ~~~~
    } else if (name == "qt_impl_type") {
        ~~~~ 50 other tokens ~~~~
    } else {
        // error
    }

and you see where I'm going.
_______________________________________________
Development mailing list
[email protected]
https://lists.qt-project.org/listinfo/development

Reply via email to