On Sat, 22 Nov 2014, Mattias Gaertner wrote:
On Sat, 22 Nov 2014 16:18:09 +0100
Jürgen Hestermann <[email protected]> wrote:
Am 2014-11-22 um 15:06 schrieb Mattias Gaertner:
> procedure TForm1.FormCreate(Sender: TObject);
> var s: string; // String = AnsiString because of $H+
> begin
> s:=GetCommandLineW;
> // GetCommandLineW returns a UTF-16 PWideChar
> // the compiler adds code to convert this to the
> // default system codepage (CP_ACP = CP_UTF8)
> // the resulting string has StringCodePage CP_ACP
> // and is encoded in UTF-8.
> // therefore you can simply use it with the LCL
Okay.
Does that mean that the compiler *always* assumes that
String=UTF-8 encoded AnsiString
Yes, with the UTF8 RTL. The default RTL uses system codepage.
Careful, there is no such thing as the "UTF8 RTL".
There is now a "Unicode and CodePage-aware RTL".
That means it has:
- Codepage aware single-byte strings.
The codepage of a string may, or may not, be UTF8 (i.e. Unicode).
- Widestrings (unicode).
The compiler handles conversion of codepages transparantly.
The codepage aware single-byte strings are not automatically UTF-8.
On linux, this is probably so. But on windows, this is not necessarily so,
Additionally, most basic File I/O routines now correctly call the underlying
OS-es file routines with the codepage the OS expects (which is WideString on Windows).
The exact behaviour of the RTL is controlled by a couple of variables:
DefaultSystemCodePage, DefaultFileSystemCodePage , DefaultRTLFileSystemCodePage.
See http://wiki.freepascal.org/FPC_Unicode_support.
Michael.
--
_______________________________________________
Lazarus mailing list
[email protected]
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus