Re: [Development] Oslo, we have a problem [char8_t]

Arnaud Clere Tue, 09 Jul 2019 00:22:52 -0700

> -----Original Message-----
> From: Thiago Macieira <[email protected]> 
> > On Monday, 8 July 2019 12:42:51 -03 Arnaud Clere wrote:
> > > -----Original Message-----
> > > From: Thiago Macieira <[email protected]>
> > > 
> > > I am not completely convinced of the benefit of adding of an owning UTF-8 
> > > string class, though I very much agree with a view over UTF-8 strings.
> > > The reason is not the string class itself (alone it is definitely 
> > > useful), but the fact that it would muddy the waters as to what 
> > > string classes one should use in API. We might end up with some API using 
> > > UTF-8 and some UTF-16.
> > 
> > Indeed, this is already the case : QJsonDocument::toJson() returns a 
> > QByteArray
>
> Which is the expected behaviour, as it returns something suitable for 
> transfer over a socket, pipe to a process or to be saved in a file, ...
>
> > on which users can conveniently call toUpper() until some data from the 
> > field makes them understand it does not work...
>
> And there's little we can do to prevent that. Even if we removed 
> QByteArray::toUpper and left it only in QLatin1String, people would still 
> find ways to uppercase.


We could have a specific type (or trait ?!) for "QByteArray"s containing utf8 
data that would enable the compiler to pinpoint some of these bugs, whereas 
presently, they can only be detected with appropriate input data.
If this *utf8* type can also be manipulated as a raw QByteArray, it does not 
change anything for code that just transfers the bytes from one place to the 
other.
I am pretty sure letting know to the compiler that the bytes returned by 
QJsonDocument::toJson are actually utf8 would help fix latent bugs in a lot of 
code.

> That's the reason I would prefer to keep it, with well- defined and 
> locale-independent semantics.

And I am with you regarding your suggestion (elsewhere in the thread) that 
QByteArray functions operating on specific charsets like toUpper should be to 
restricted to the ASCII subset common to latin1/utf8.

> > It may be argued too that COW is not interesting for such strings and APIs 
> > can be fixed by using 
> > u8string, but then, you ask Qt users to master both QString and std::string 
> > like APIs...
>
> We don't ask users to use std::string APIs. That is not a text class, 
> std::string is analogous to QByteArray. C++ does not have a text container 
> class and that's not going to come until at least 2023 (C++2b).

I know. I was thinking aloud about using the future std::u8string... which 
presumably exhibits a std::string-like API to which Qt users would not be 
accustomed... I am not advocating for using it.
_______________________________________________
Development mailing list
[email protected]
https://lists.qt-project.org/listinfo/development

Re: [Development] Oslo, we have a problem [char8_t]

Reply via email to