[Development] QAnyStringView

Marc Mutz via Development Tue, 23 Jun 2020 02:36:03 -0700

Hi,

I went to the drawing board and drew up a variant string view class.It's here: https://codereview.qt-project.org/c/qt/qtbase/+/301594

Here's why I think we need it. At the end of the email, I also suggesthow we should go about introducing it into Qt.

Thiago and Lars are meanwhile convinced that we need a QUtf8tringView,too. Lars sees some merit for low-level APIs, Thiago remainsunconvinced.

I have come to believe that QUtf8StringView without QAnyStringView won'tfly: Introducing QUtf8StringView without QAnyStringView will explode thenumber of mixed-type operations we need to support. If we don't removeanything, we're talking about


- QString
- QStringRef*
- QStringView
- QByteArray
- QByteArrayView
- QUtf8StringView
- QLatin1String
- char16_t
- QChar
- char8_t
- char
- QLatin1Char
- const char*
- const char16_t*
- const char8_t*

and anything I've forgotten. The best we can do to condense this down isto revoke string-ness of QByteArray and we'd be left with


- QStringView
- QLatin1String
- QUtf8StringView
- QChar

the latter would have to accept plain char again, something weASCII_DEPRECATED years ago, but should be re-considered under the newsrc-is-UTF-8 paradigm.

Lars would probably say that we could also drop QLatin1String, whichwhich I disagree[1].

Assuming for the sake of argument that we need those four types,consider QString::replace(). Experience shows that stuff likeQStringBuilder expressions being passed will require an actual QStringoverload to be present, too. Ignoring existing overloads and regexp,we'd need 5x5=25 overloads. I won't enumerate them here. What I willenumerate is the complete set of overloads when using QAnyStringView:

QString& QString::replace(QAnyStringView, QAnyStringView,Qt::CaseSensitivity);


That's it.

Unlike QStringView, QAnyStringView is a pure interface type. I won't addmuch in the way of parsing API to it, even though I acknowledge that's aslippery slope. While it would be easy to add trimmed(), and tokenize()would be really interesting, QAnyStringView should not be used forparsing. That's what we have the three non-variant string view typesfor. Being a pure interface type means we can add more "dangerous"conversions. QStringView can't be constructed from a QStringBuilder,e.g., because it's almost impossible to make that work withoutreferencing destroyed data:


   QStringView s = u'c' + QString::number(x); // oops
   QString c = u'c' + QString::number(x);
   QStringView s = c; // ok

But QAnyStringView supports this:

   str.replace(name, name % "_1");

In summary: 25 overloads is just way too much (and don't forget regex,which adds another five).

The replace() problem is also present with relational operators andbasically wherever we have two QString arguments right now.

QAnyStringView solves this in the sense that one overload can replacemany overloads. The complexity is still there, a binary visitation of aQAnyStringView produces nine instantiations of the visitor (though thatcan be reduced to six in many cases), but many implementations fall intoone of just two classes: 1) a function would just call toString() on theany-string-view, anyway, in which case the QString construction is takenout of user code and centralized in the library. If you think thatdoesn't matter, look at the tst_qstatemachine numbers in

https://codereview.qt-project.org/c/qt/qtbase/+/301595 (-10KiB justfrom temporary QString creation and destruction)

2) the complexity is already there and QAnyStringView helps in reducingit:


  https://codereview.qt-project.org/c/qt/qtbase/+/303483 (QCalendar)
  https://codereview.qt-project.org/c/qt/qtbase/+/303512 (QColor)
  https://codereview.qt-project.org/c/qt/qtbase/+/303707 (arg())
  https://codereview.qt-project.org/c/qt/qtbase/+/303708 (QUuid)

Another aspect that I'd like to mention is how QAnyStringView also helpswith getting rid of QLatin1String for Qt 7: Instead of having QL1Sstrewn around the Qt API as we have now, we'd have just theQAnyStringView(QLatin1String) ctor that we'd need to deprecate.

Finally, of course, QAnyStringView increases integration of Qt withother C++ libraries, because it now transparently accepts almost anystring type that exists out there (thanks to Peppe's Magic QStringViewctor that QUtf8tringView and QAnyStringView inherit).

I was very sceptical when some months ago someone on this ML suggestedto make QString hold either UTF8 or UTF16 data, and I still am, but inan explicit variant string view type, this concept suddenly makes a lotof sense.

Now that I hopefully have convinced you that we need QAnyStringView,where to go from here?

Given the lack of time until Qt 6.0, I'd like to propose to just replaceall overload sets that contain QL1S with one overload takingQAnyStringView

The implementation usually contains the optimized handling of L1 dataalready, and can often be easily extended to UTF-8, too, cf. QColor,QUuid, arg().

This should really happen for Qt 6, because it will greatly clean up ourlower-level APIs and tell a consistent story.

On top of that, we can also think of replacing overloads sets thatcontain QString and (QStringView or QStringRef) with one overload takingQAnyStringView, or QString functions that typically get passed constants(like setObjectName()), but I agree with Lars that there's not enoughtime and man-power to bring this to a conclusion for Qt 6.


Thanks,
Marc

[1] First, we have a lot of existing QLatin1String use in code, both inQt itself, as well as in code that has seen e.g. Clazy. Users ofQLatin1String know why they use this class - it's either to silenceQT_NO_CAST_FROM_ASCII or because there's a QLatin1String overload thatthey call and that prevents a QString creation. Either way, thesedevelopers will not react kindly if a recommended-in-Qt-5 solutionsuddenly gets either removed or heavily pessimized in Qt 6.

Second, UTF-8 is a multi-byte encoding, like UTF-16. Unlike L1 ->UTF-16, however, the number of code points needed to represent L1 in U8is not constant. That means that important optimisations like

bool operator==(QLatin1String lhs, QStringView rhs) { returnlhs.size() == rhs.size() && ~~~~; }


no longer work:

bool operator==(QUtf8StringView lhs, QStringView rhs) { returnlhs.size() == rhs.size() // NOPE!

If you think this doesn't matter, think again: it's the reason why inC++20 the original design of <=> was changed to only synthesize <, >,<=, >= and no longer also ==, !=. If you still don't believe, look atsome if-else-chain that probably already exists somewhere (uic comes tomind):


    if (name == QLatin1String("name")) {
        ~~~~
    } else if (name == QLatin1String("type")) {
        ~~~~ 50 other tokens ~~~~
    } else {
        // error
    }

all of these start with a size check whereas

    if (name == "widget") {
        ~~~~
    } else if (name == "type") {
        ~~~~ 50 other tokens ~~~~
    } else {
        // error
    }

cannot. They immediately go into the strcmp loop. Now imagine there's arather common prefix to all these tags...


    if (name == "qt_impl_widget") {
        ~~~~
    } else if (name == "qt_impl_type") {
        ~~~~ 50 other tokens ~~~~
    } else {
        // error
    }

and you see where I'm going.
_______________________________________________
Development mailing list
[email protected]
https://lists.qt-project.org/listinfo/development

[Development] QAnyStringView

Reply via email to