On Tuesday 15 April 2014, Koehne Kai wrote: > > -----Original Message----- > > From: [email protected] > > [mailto:[email protected]] On > > Behalf Of Thiago Macieira > > Sent: Monday, April 14, 2014 7:34 PM > > To: [email protected] > > Subject: Re: [Development] utf-8 BOM and parsers > > Hi Thiago, > > Thanks for listening the reasons here in detail! > > > Em seg 14 abr 2014, às 09:59:18, Thiago Macieira escreveu: > > > Also, the Unix philosophy is that UTF-8 BOMs should not be used. This > > > started on Windows, with tools like Notepad, where changing the > > > system locale is not an option. > > It's mostly an issue with (files edited on) Windows, indeed. > > > To be clear: BOMs are to be used to determine that the content *is* > > UTF-8. Once you know that it is UTF-8, you can strip it and pass to the > > decoder. Passing the BOM to the decoder sounds wrong because you'd be > > expecting ito choose the codec when decoding. That's what Notepad does: > > if there's a BOM, it decodes as UTF-8; otherwise it decodes as ANSI. > > Right. But the issue is that the 'easiest' way to get a file into a qstring > so far is > > QFile file; > // ... > QString::fromUtf8(file.readAll()); > > We're using that pattern btw in both Qt and Qt Creator, too. This breaks > now in ways that can be pretty subtle (given that it only affects files > starting with a BOM, and that the BOM isn't displayed usually). > > > Having the BOM there also breaks roundtrip: > > QString bom = u"\ufeff" "any string goes here"; > > QCOMPARE(QString::fromUtf8(bom.toUtf8()), bom); > > > > QString::toUtf8 does not, cannot and will never add the BOM. It would > > break concatenation. > > So you'd have to add a BOM explicitly to the file before writing, if you > really want it. > BOM has no official meaning and function other than as a zero-width non- breaking space in UTF-8. It was only meant as a byte-order marker in 16- and 32-bit unicode. If you add it to unix files it breaks other magic markers at the beginning of the file. UTF-8 BOM is a Windows specific non-standard hack that is recommended against. So yes, anyone that wants it needs to add it themselves, as it becomes part of the text content on any other platform
`Allan _______________________________________________ Development mailing list [email protected] http://lists.qt-project.org/mailman/listinfo/development
