Okay, take first Unicode emoticon from
[url=http://vk.com/pages?oid=-42154384&p=%D0%A1%D0%BF%D0%B8%D1%81%D0%BE%D0%BA]VK
.com List[/url], U+1F60A (#128522 decimal). I can copy and paste it "rectangle"
to PSPad, then save as UTF-16. It will produce two UTF-16 codes in the file —
surrogate pair:
D83D DE0A

It's great! I can reopen this file in PSPad and save it as UTF-8. It will
produce two 3-byte sequences:
ED A0 BD ED B8 8A

This is not true UTF-8, but CESU-8 — _C_ompatibility-_E_ncoded
_S_urrogates to _U_TF-_8_. It also allowed, but other
Unicode-aware editors, like built-in Far Manager, will produce true UTF-8 —
one 4-byte sequence, starting with $Fx:
F0 9F 98 8A

PSPad cannot decode that file.

Because Windows XP and newer have full support of UTF-16, the problem is in
Delphi itself:

function Utf8Decode(const Source: UTF8String): WideString; // my function
var
  L: Integer;
  Dest: WideString;
begin
  L := Length(Source);
  SetLength(Dest, L);
  SetLength(Dest, MultiByteToWideChar(CP_UTF8, 0, Pointer(Source), L,
Pointer(Dest), L));
  Result := Dest;
end;

function Utf8Encode(const Source: WideString): UTF8String; // my function
var
  L: Integer;
  Dest: UTF8String;
begin
  L := Length(Source);
  SetLength(Dest, L * 3);
  SetLength(Dest, WideCharToMultiByte(CP_UTF8, 0, Pointer(Source), L,
Pointer(Dest), Length(Dest), nil, nil));
  Result := Dest;
end;

procedure TMainForm.Button1Click(Sender: TObject);
const
  Smile = #$F0#$9F#$98#$8A; // 4-byte UTF-8
begin
  MessageBoxW(Handle, Pointer(Utf8Decode(Smile)), nil, 0); // one char, good
  MessageBoxW(Handle, Pointer(System.Utf8Decode(Smile)), nil, 0); // null
string, wrong
end;

Being compiled under Delphi XE2, Delphi built-in function will also produce a
surrogate pair, because Embarcadero fixed System.Utf8Decode function in
Unicode-aware Delphi.

I was a little wrong about UTF-8 in Windows XP: both surrogates and 4-byte UTF-8
work fine, but we should pass zero as Flags parameter to MultiByteToWideChar
function, as wrote in MSDN.

-- 
<http://forum.pspad.com/read.php?4,45099,64630>
PSPad freeware editor http://www.pspad.com

Odpovedet emailem