Re: [Development] Oslo, we have a problem [char8_t]

André Hartmann Sun, 07 Jul 2019 02:59:44 -0700

Hi Thiago,

> But QByteArray is encoding-indeterminate since it can carry any type.

Correct, it is often used as "generic raw data array", e.g. in QFile,
Q*Socket, QSerialPort, QCanBusFrame etc. Here we really need to treat
the data as-is, without interpretation.

> Arguably, toUpper() and toLower() should be removed, since
>
>    QByteArray(u8"Résumé").toLower()
> is mojibake.

I vote against that. If you got the "raw" data from a device as
described above, you might want to do .toHex().toUpper() which is fully
valid.

So either:

- restrict the functions now operating on Latin1 functions to pure ASCII
- or give a possibility to select the encoding in QByteArray
  (e.g. by adding fromLatin1() and fromUtf8()). That would also
  add the possibility to correctly operate on UTF-8 strings.
  But probably a separate QString8 class would be better.

> I wouldn't mind a udata() function anyway, since there's a lot of code
> dealing with "bytes" as unsigned char.

+1

> Are we willing to add ubegin() and begin8() too?

If that all fit's in one class, why not?!

Best regards,
André

On 06.07.19 18:59, Thiago Macieira wrote:

On Saturday, 6 July 2019 11:09:36 -03 Mutz, Marc via Development wrote:

Anyway, QByteArray has *Latin1* text-manipulation functions (toUpper
and
toLower), its split(char) function will happily split on indivdual
bytes of an
UTF-8 multibyte sequence, so adding char8_t overloads seems just wrong
to me.


const char* in Qt is always assumed to be UTF-8-encoded. You need to use
QLatin1String to have it interpreted as Latin-1:

https://doc.qt.io/qt-5/qstring.html#QString-8
https://doc.qt.io/qt-5/qstring.html#QString-7


That's QString, not QByteArray.

But QByteArray is encoding-indeterminate since it can carry any type.
Arguably, toUpper() and toLower() should be removed, since

        QByteArray(u8"Résumé").toLower()
is mojibake.

In fact, QByteArray should use std::byte in functions like data(), but that's
unwieldy and breaks too much compatibility.

What did you try to use QByteArray with that showed problems?


Just QByteArray(u8"Hello") already fails when compiled with -std=c++2a.
And this is also why we need to fix it. The same compiles fine in C++17,
and does the expected thing.


I think we need to talk to SG16.

We can add the template overloads to all functions so we can take char,
unsigned char, std::byte and char8_t without complaining. I am with you that
this could result in explosive compile times[1]. But it also does not solve
the problem of what type data() / constData() and the iteration functions
return.

I wouldn't mind a udata() function anyway, since there's a lot of code dealing
with "bytes" as unsigned char. Are we willing to add ubegin() and begin8()
too?

[1] Please, no one say "Modules!" here, it's not a full solution, even if we
can use them in Qt 6's lifetime.


_______________________________________________
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development

Re: [Development] Oslo, we have a problem [char8_t]

Reply via email to