Mattias Gaertner schrieb:
On Tue, 02 Dec 2014 22:41:04 +0100
Hans-Peter Diettrich <drdiettri...@aol.com> wrote:

[...]
I can see two major problems with the current FPC AnsiString model. The first problem is the strange FPC convention, that a string variable can have a different static/dynamic encoding, not only with RawByteString. That convention (flaw) can require an explicit SetCodePage for every string parameter, because a string argument of e.g. static type CP_OEM (for console output) can have any other actual (dynamic) encoding, not useful when passing the string to the external function.

The FPC sources need SetCodePage only in the RTL and only either for
codepage conversion functions or for Default(RTL)FileSystemCodePage.
It seems it is not a "major" problem for Lazarus users.

Let's see, currently I try to debug AnsiUpperCase, that doesn't seem to work.
Q: how do I debug the RTL (step into)?


The next problem results from the Delphi incompatible dynamic encoding of CP_ACP(=0), that seems to be used when a literal is stored in an AnsiString. These strings have the encoding assumed at *compile time*, perhaps from a {$codepage ...} switch, which can differ from the DefaultSystemCodepage at *runtime*. Then the conversion routines assume the the string is encoded according to DefaultSystemCodepage, what's not necessarily true:

var
   A: AnsiString;
begin
   a := ' äöü';
   WriteLn('CP_ACP=',DefaultSystemCodePage);
   WriteLn('Ansi CP=',StringCodePage(a),' Len=',Length(a),' ="',a,'"');
end.

Reports (on Windows) CP_ACP=1252, string CP=0, and due to the Lazarus File Encoding of UTF-8 the string literal and variable contains UTF-8 (Len=7), as assumed by the compiler. The attempt in WriteLn, to convert the string to CP_OEM from encoding 0, mapped by TranslatePlaceholderCP into DefaultSystemCodePage (=1252 at runtime), results in a conversion of the UTF-8 bytes from CP 1252 into CP_OEM :-(

I described two ways in my other mail how to handle that.

I don't want want workarounds for a flawed FPC implementation, I want an FPC working on Windows without hacks.


About the example:
Writeln on the Windows Console requires the console codepage and is
therefore limited to characters of this codepage.

That's perfectly sufficient for my tests.

If your code contains
literals for a specific Windows codepage then you are limiting
yourself to that codepage (not x-platform). That is your choice.
OTOH Lazarus main target is x-platform programs. For example the
UTF8ToConsole solution works on Unix too, while your CP1252
example does not.

What's CP1252 specific in my example?

With FPC 2.7.1 there is a new possibility.

Please note that I *am* using and writing about FPC 2.7.1.

With the new UTF-8 mode
your example gives:

CP_ACP=65001
Ansi CP=0 Len=7 =" äöü"

This works on Unix too, while the CP1252 example does not.
Under Windows it works if the console codepage contains "äöü" (which
can be more than one codepage). Basically the compiler adds the
UTF8ToConsole for you.

This works only for a DefaultSystemCodePage of UTF-8, see your CP_ACP encoding shown above :-(

If this doesn't change, the string encodings are quite useless, and a single AnsiString type of fixed encoding CP_UTF8 were sufficient (and faster, due to omitted string conversions). Windows users may not like that, some prefer to use the default Windows codepage or UTF_16 instead (Delphi compatible).


[...]
Delphi string literals instead come with their true dynamic encoding, which never can be 0, and thus can be assigned and shown properly. Above code then will show CP=1252 and Len=4 for the AnsiString variable.

No, it should show garbage and Len=7, because the source is UTF-8,
while the compiler treats it as your system codepage.

Well, I tested my program with XE, with the default Windows textfile encoding. When FPC or Lazarus has problems with such a program file, then something is flawed :-(

DoDi


--
_______________________________________________
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus

Reply via email to