Graeme Geldenhuys wrote:
Thanks Marc,
It was a real informative explanation, and helped a lot. Now lets see
if I understood it correctly. :-)
AnsiStrings are 1 byte wide and WideStrings are 2 byte wide. Fixed.
The delphi WideStrings are the same as used by MS in their Wide
functions, being UCS-2 initially. So all unicode chars would fit in
there (initially).
Now this brings me to another point which makes no sense! Naming
convertion of functions.
Lets look at the following RTL function as an example:
* StrPos is used for 1 byte (8 bit) ANSI strings.
* AnsiStrPos is used for multi-byte (or is that 2 bytes max) UTF-8
strings. aka WideString.
afair, UTF-8 characters can be build up to 5 bytes.
WideStrings are strings where each element consists of 1 word (=2
bytes). Note that I call it element and not character, since a character
may be build up using one or more elements.
So why did Borland name it AnsiStrPos, when it doesn't operate on ANSI
strings!! Why not name it Utf8StrPos or WideStrPos? The prefix Ansi*
completely goes against what it does (operates on)! It doesn't work
with Ansi strings, but rather WideStrings.
nope, it does work on AnsiStrings
resume:
Ansistring: array of 1 byte elements
WideString: array of 1 word (=2 byte) elements
Now here is another piece of code - parts removed for the purpose of
simplicity. To protect the innocent, we will keep the author
anonymous. ;-)
The code below is a function that outputs Unicode or Ansi text to a canvas.
Is my assumtions about this code correct?
1... I assume that the String type, as used in the parameter "AText:
String", can hold ANSI or Unicode text. The function supports both,
but isn't sure what it is going to get.
Yes
2... The function Utf8ToUnicode tells me, that it is going to process
a String (AText) to Unicode using the UTF-8 algorithm.
it assumes the string is UTF8 encoded, and it returns the number of
bytes needed to construct this string as widestring.
3... The function Utf8ToAnsi tell me, that it is going to process a
String (AText) which might contain UTF-8 encoded text to 8-bit ANSI
text.
Yes.
4... If this code only supported Unicode (UTF8), we could have defined
the parameter AText as UTF8String and remove the second part of the if
statement inside the function.
The problem is that windows only supports 2 encodings:
1) Ansi, each char is coded in exactly 1 byte
2) Wide, where I dont know what the current state is what MS supports.
It used to be each char is coded in exactly 1 word (=2bytes). Being the
old UCS-2 interpretation. Maybe they currently support UTF-16 on newer OSes.
So you either call Windows.TextOutA or Windows.TextOutW.
Note:
in the C headers the function TextOut is defined as either TextOutA
or TextOutW, depending on a define if you want a Ansi or Wide api.
On most pascal translations, the function TextOut is mapped to TextOutA
Marc
Is all this correct?
--------------------------------
procedure TGDICanvas.DoTextOut(const AText: String);
var
WideText: PWideChar;
AnsiText: string;
begin
if UnicodeEnabledOS then
begin
Size := Utf8ToUnicode(nil, PChar(AText), 0);
WideText := GetMem(Size);
Utf8ToUnicode(WideText, PChar(AText), Size);
dynWindows.TextOutW(.... WideText....);
FreeMem(WideText);
end
else
begin
AnsiText := Utf8ToAnsi(AText);
Windows.TextOut(.... PChar(AnsiText) .....);
end;
end;
--------------------------------
Regards,
- Graeme -
_________________________________________________________________
To unsubscribe: mail [EMAIL PROTECTED] with
"unsubscribe" as the Subject
archives at http://www.lazarus.freepascal.org/mailarchives
_________________________________________________________________
To unsubscribe: mail [EMAIL PROTECTED] with
"unsubscribe" as the Subject
archives at http://www.lazarus.freepascal.org/mailarchives