> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On
> Behalf Of Thiago Macieira
> Sent: Monday, April 14, 2014 7:34 PM
> To: [email protected]
> Subject: Re: [Development] utf-8 BOM and parsers

Hi Thiago,

Thanks for listening the reasons here in detail!

> Em seg 14 abr 2014, às 09:59:18, Thiago Macieira escreveu:
> > Also, the Unix philosophy is that UTF-8 BOMs should not be used. This
> > started  on Windows, with tools like Notepad, where changing the
> > system locale is not an option.

It's mostly an issue with (files edited on) Windows, indeed. 

> To be clear: BOMs are to be used to determine that the content *is* UTF-8.
> Once you know that it is UTF-8, you can strip it and pass to the decoder.
> Passing the BOM to the decoder sounds wrong because you'd be expecting
> ito choose the codec when decoding. That's what Notepad does: if there's a
> BOM, it decodes as UTF-8; otherwise it decodes as ANSI.

Right. But the issue is that the 'easiest' way to get a file into a qstring so 
far is

QFile file;
// ...
QString::fromUtf8(file.readAll());

We're using that pattern btw in both Qt and Qt Creator, too. This breaks now in 
ways that can be pretty subtle (given that it only affects files starting with 
a BOM, and that the BOM isn't displayed usually).

> Having the BOM there also breaks roundtrip:
> 
>       QString bom = u"\ufeff" "any string goes here";
>       QCOMPARE(QString::fromUtf8(bom.toUtf8()), bom);
> 
> QString::toUtf8 does not, cannot and will never add the BOM. It would break
> concatenation.

So you'd have to add a BOM explicitly to the file before writing, if you really 
want it.

> I know this is a behaviour change. But I repeat that it is an *intentional*
> change.
>
> The U+FEFF character is called "zero-width non-breaking space" (ZWNBSP)
> anywhere else, so it's valid to appear there. Including the next character in 
> a
> file.

Right, though I understood this is deprecated since Unicode 3.2 (released in 
2002).

All in all, I see a lot of code breaking with this change ... Given that, I'd 
like to give a +1 for reverting to the behavior for 5.3 from my side. 

My 2 cents

Kai
_______________________________________________
Development mailing list
[email protected]
http://lists.qt-project.org/mailman/listinfo/development

Reply via email to