Il 09/03/2014 09:36, Mattias Gaertner ha scritto:
On Sun, 09 Mar 2014 01:30:36 +0100
Giuliano Colla <[email protected]> wrote:
[...]
I was aware of that. My problem is that the char I must add to the Utf8
string is calculated run time, and is in the range Unicode $A0-$BF.
The Unicode ranges are given in "code points". These are abstract
values that must be encoded in bytes. The most common
encodings are UTF-8 and UTF-16.
Code point $A0 has two bytes in UTF-8: $C2$A0.
I had assumed (wrongly) that the compiler was smart enough to convert a
type "char" to UTF8,
A char is not a code point. A char is an element of string.
Every byte encoding consists of chars and so does UTF-8.
when concatenating it to an UTf8 string. Instead it
turns out that the character is appended as it is, which leads to an
invalid UTF8 character (above 127), which displays as a crossed box.
IMHO that's an FPC bug.
It's not a bug.
When I realized that, I then tried to explicitly convert the Unicode
char to UTF8, but again I failed, this time because of the default
behavior which is to map char <-> Unicode only in the range 0-127.
That's because UTF-8 maps Unicode 0-127 to one byte with the same
value as the code point.
Above that it uses a different mapping.
Anything above 127 becomes a question mark.
Therefore my symbol displays as a question mark.
IMHO that's a silly FPC limitation.
Maybe you underestimate FPC.
FPC supports various source encodings. Lazarus uses by default UTF-8.
[...]
There are some useful UTF-8 functions in unit LazUTF8 and LazFileUtils.
Thanks to everybody for the suggestions.
Giuliano
--
_______________________________________________
Lazarus mailing list
[email protected]
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus