> -----Original Message----- > From: [email protected] > [mailto:[email protected]] On > Behalf Of Thiago Macieira > Sent: Monday, April 14, 2014 7:34 PM > To: [email protected] > Subject: Re: [Development] utf-8 BOM and parsers
Hi Thiago, Thanks for listening the reasons here in detail! > Em seg 14 abr 2014, às 09:59:18, Thiago Macieira escreveu: > > Also, the Unix philosophy is that UTF-8 BOMs should not be used. This > > started on Windows, with tools like Notepad, where changing the > > system locale is not an option. It's mostly an issue with (files edited on) Windows, indeed. > To be clear: BOMs are to be used to determine that the content *is* UTF-8. > Once you know that it is UTF-8, you can strip it and pass to the decoder. > Passing the BOM to the decoder sounds wrong because you'd be expecting > ito choose the codec when decoding. That's what Notepad does: if there's a > BOM, it decodes as UTF-8; otherwise it decodes as ANSI. Right. But the issue is that the 'easiest' way to get a file into a qstring so far is QFile file; // ... QString::fromUtf8(file.readAll()); We're using that pattern btw in both Qt and Qt Creator, too. This breaks now in ways that can be pretty subtle (given that it only affects files starting with a BOM, and that the BOM isn't displayed usually). > Having the BOM there also breaks roundtrip: > > QString bom = u"\ufeff" "any string goes here"; > QCOMPARE(QString::fromUtf8(bom.toUtf8()), bom); > > QString::toUtf8 does not, cannot and will never add the BOM. It would break > concatenation. So you'd have to add a BOM explicitly to the file before writing, if you really want it. > I know this is a behaviour change. But I repeat that it is an *intentional* > change. > > The U+FEFF character is called "zero-width non-breaking space" (ZWNBSP) > anywhere else, so it's valid to appear there. Including the next character in > a > file. Right, though I understood this is deprecated since Unicode 3.2 (released in 2002). All in all, I see a lot of code breaking with this change ... Given that, I'd like to give a +1 for reverting to the behavior for 5.3 from my side. My 2 cents Kai _______________________________________________ Development mailing list [email protected] http://lists.qt-project.org/mailman/listinfo/development
