Michael Schnell wrote:
Hi Experts,
[.....]
When I want to simply assign a constant text "ö2" to a WideString I
would think that I just write s := 'ö2'; . But I found that this does
not work, but that it creates a WideString of length 3 that contains
the three 8-Bit subcodes of the utf8-coded string "ö2", zero-extended
to 16 Bits, each in one WideChar element. For me this is very
surprising and incompatible to the same code (s := 'ö2'; ) used in a
Turbo-Delphi program.
Obviously - other than Turbo-Delphi that uses ANSIString here - a
constant string gets UTF8String as it's intermediate type. This might
be a useful definition, but if that is done this way why does an
assignment WideString := UTF8String inot implicitly call UTF8Decode as
a type conversion ? In my example it calls fpc_ansistr_to_widestr
instead, just as if the UTF8String would be an ANSIString.
I am not an expert, but here is what I believe to know:
This is the result of 2 (hidden) "features":
AFAIK the compiler reads the source as non-utf8 (latin or some 8 bit
encoding). This leads to other things too, like identifiers cannot
contain utf8.
The String within the quotes is a byte sequence to the compiler. And the
compiler does not know it to be utf8. From your description I take it
the compiler does translate those 3 "8bit chars" into some 16bit chars
(correctness of this translation based on the 8bit source encoding is
another question)
Lazarus uses UTF8 for everything, it will save your string as utf8. If
Your string was kept as ansistring, the compiler would treat it as
bytes, and pass it through, so any code wanting to see the utf8 would be
fine.
You can try and tell Lazarus to save you file as latin1. As long as all
you strings fit into latin1, this may work; IF and only if the compiler
will translate the latin1 into correct Widechars.
It will not work for anything not in utf8. AFAIK Lazarus currently
doesn't save in ucs2 (or any 16 bit encoding). But even if Lazarus did,
since the compiler wants 8bit encoding, your whole source would be broken.
Not much help, I know. Maybe some one else does have more ideas / knowledge.
Is there some compiler setting to change this ?
-Michael
_______________________________________________
fpc-devel maillist - fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel
_______________________________________________
fpc-devel maillist - fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel