Michael Schnell wrote:
Hi Experts,
[.....]
When I want to simply assign a constant text "ö2" to a WideString I would think that I just write s := 'ö2'; . But I found that this does not work, but that it creates a WideString of length 3 that contains the three 8-Bit subcodes of the utf8-coded string "ö2", zero-extended to 16 Bits, each in one WideChar element. For me this is very surprising and incompatible to the same code (s := 'ö2'; ) used in a Turbo-Delphi program.

Obviously - other than Turbo-Delphi that uses ANSIString here - a constant string gets UTF8String as it's intermediate type. This might be a useful definition, but if that is done this way why does an assignment WideString := UTF8String inot implicitly call UTF8Decode as a type conversion ? In my example it calls fpc_ansistr_to_widestr instead, just as if the UTF8String would be an ANSIString.

I am not an expert, but here is what I believe to know:

This is the result of 2 (hidden) "features":

AFAIK the compiler reads the source as non-utf8 (latin or some 8 bit encoding). This leads to other things too, like identifiers cannot contain utf8.

The String within the quotes is a byte sequence to the compiler. And the compiler does not know it to be utf8. From your description I take it the compiler does translate those 3 "8bit chars" into some 16bit chars (correctness of this translation based on the 8bit source encoding is another question)

Lazarus uses UTF8 for everything, it will save your string as utf8. If Your string was kept as ansistring, the compiler would treat it as bytes, and pass it through, so any code wanting to see the utf8 would be fine.

You can try and tell Lazarus to save you file as latin1. As long as all you strings fit into latin1, this may work; IF and only if the compiler will translate the latin1 into correct Widechars.

It will not work for anything not in utf8. AFAIK Lazarus currently doesn't save in ucs2 (or any 16 bit encoding). But even if Lazarus did, since the compiler wants 8bit encoding, your whole source would be broken.

Not much help, I know. Maybe some one else does have more ideas / knowledge.

Is there some compiler setting to change this ?

-Michael

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel
_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Reply via email to