On Mon, 11 Jul 2016 10:57:57 +0100 Graeme Geldenhuys <[email protected]> wrote:
> On 2016-07-10 06:20, Martin Schreiber wrote: > > We always can write that "UnicodeString" is the wrong name for a reference > > counted utf-16 string because UTF8String or AnsiString with default code > > page > > set to utf-8 also is Unicode in order to express our anger about the bad > > marketing driven decision of the Delphi owners. > > G*d, I so agree with that too! I simply hate the name "UnicodeString" > implicitly implying UTF-16 only. "Unicode" is an algorithm with 3 > official encodings, not just UTF-16. You know well that the name UnicodeString came from Delphi, where it fits, because it is their only string supporting Unicode. No one forces you to use this name in your code. You can define your own alias type. > Then to boot, they introduced the AnsiString mess in FPC 3.0 - which now > doesn't only mean ANSI encoding (contrary to what the name suggests), it > now means Unicode encodings too. 1. AnsiString comes from Microsoft ANSI code pages, which was not an ANSI-standard at all, so the term "Ansi" was a misnomer from the beginning. 2. MS accepted that and nowadays calls them only "code pages". But many of their pages still use the term "ANSI code page". 3. The Unicode consortium added UTF-8 specially designed for legacy code using 8-bit strings. 4. Microsoft added the UTF-8 code page 65001 (and also code pages for UTF-16 and UTF-32), but no MS Windows used it as system code page. FPC's AnsiString uses the MS code pages numbers, which includes UTF-8. The new FPC 3.0 strings made it easier to use UTF-8 strings - aka you need less conversions and more RTL functions support Unicode - while still keeping compatibility. >[...] Mattias -- _______________________________________________ Lazarus mailing list [email protected] http://lists.lazarus-ide.org/listinfo/lazarus
