Hi, Just came back from vacation today.
Unfortunately BOM’s at the beginning of files seem to still be used quite a bit esp. in the Windows world. So I would actually vote for option 1 and rather keep compatibility. Reason is that stripping the BOM will not break anything, but leaving it in will. We could also consider using our builtin utf8 decoder for all utf8 locales, so that we don’t use iconv or ICU if the locale is utf-8 (and thus always strip the BOM). That would at least give us consistent cross platform behaviour. Cheers, Lars On 16/04/14 17:03, "Thiago Macieira" <[email protected]> wrote: >Em seg 14 abr 2014, às 10:33:48, Thiago Macieira escreveu: >> Em seg 14 abr 2014, às 09:59:18, Thiago Macieira escreveu: >> > Also, the Unix philosophy is that UTF-8 BOMs should not be used. This >> > started on Windows, with tools like Notepad, where changing the >>system >> > locale is not an option. >> >> To be clear: BOMs are to be used to determine that the content *is* >>UTF-8. >> Once you know that it is UTF-8, you can strip it and pass to the >>decoder. >> Passing the BOM to the decoder sounds wrong because you'd be expecting >>ito >> choose the codec when decoding. That's what Notepad does: if there's a >>BOM, >> it decodes as UTF-8; otherwise it decodes as ANSI. >> >> Having the BOM there also breaks roundtrip: >> >> QString bom = u"\ufeff" "any string goes here"; >> QCOMPARE(QString::fromUtf8(bom.toUtf8()), bom); >> >> QString::toUtf8 does not, cannot and will never add the BOM. It would >>break >> concatenation. >> >> I know this is a behaviour change. But I repeat that it is an >>*intentional* >> change. >> >> The U+FEFF character is called "zero-width non-breaking space" (ZWNBSP) >> anywhere else, so it's valid to appear there. Including the next >>character >> in a file. > >Lars, can you make a call? > >Options are: >1) revert to old behaviour, change the content creators to never add a BOM > >2) same as above, but fix the parsers now and change the behaviour in >QString >in Qt 5.4 or 5.5 > >3) keep the new behaviour, document it in the changelog, change the >content >creators as above, and fix the parsers > >-- >Thiago Macieira - thiago.macieira (AT) intel.com > Software Architect - Intel Open Source Technology Center > >_______________________________________________ >Development mailing list >[email protected] >http://lists.qt-project.org/mailman/listinfo/development _______________________________________________ Development mailing list [email protected] http://lists.qt-project.org/mailman/listinfo/development
