On 07.05.2017 12:17, Florian Klaempfl via Lazarus wrote: > Am 07.05.2017 um 12:11 schrieb Sven Barth via Lazarus: >> Am 07.05.2017 12:07 schrieb "Florian Klaempfl via Lazarus" >> <[email protected] <mailto:[email protected]>>: >>> >>> Am 07.05.2017 um 11:57 schrieb Graeme Geldenhuys via Lazarus: >>>> On 2017-05-07 09:10, Florian Klaempfl via Lazarus wrote: >>>>>> Yeah, that would be the logical thing to do. >>>>> >>>>> Why? What makes a string literal UTF-8? >>>>> >>>> >>>> As Mattias said, the fact that the source unit is UTF-8 encoded. >>>> Defined by a BOM marker, or -Fcutf8 or {$codepage utf8}. If the source >>>> unit is UTF-8 encoded, the literal string constant can't (and >>>> shouldn't) be in any other encoding. >>>> >>>> I would say the same if the source unit was stored in UTF-16 >>>> encoding. Then string literals would be treated as UTF-16. >>> >>> And if a ISO/Ansi codepage is given? Things would probably fail. >>> >>> The point is: FPC is consistent in this regard: also sources with a >>> given iso/ansi codepage are handled the same way. If there is a string >>> literal with non-ascii chars, it is converted to UTF-16 using the >>> codepage of the source. Very simple, very logical. It is a matter of >>> preference if UTF-8, -16, -32 are chosen at this point, but FPC uses >>> UTF-16. If it uses UTF-8, the problem would occur the other way around. >>> >>> If no codepage is given (by directive, command line, BOM), string >>> literals are handled byte-wise as raw strings. >> >> Small correction: FPC only does this conversion if the codepage is >> UTF-8, no other. > > Then something is wrong/broken :) >
Well, the code in tscannerfile.readtoken() only does the conversion to UTF-16 if the source codepage is UTF-8, otherwise it only converts to UTF-16 if the string is already an UTF-16 string. So probably not broken as it seems rather on purpose; if at all it's wrong... Regards, Sven -- _______________________________________________ Lazarus mailing list [email protected] http://lists.lazarus-ide.org/listinfo/lazarus
