Okay, take first Unicode emoticon from [url=http://vk.com/pages?oid=-42154384&p=%D0%A1%D0%BF%D0%B8%D1%81%D0%BE%D0%BA]VK .com List[/url], U+1F60A (#128522 decimal). I can copy and paste it "rectangle" to PSPad, then save as UTF-16. It will produce two UTF-16 codes in the file — surrogate pair: D83D DE0A
It's great! I can reopen this file in PSPad and save it as UTF-8. It will produce two 3-byte sequences: ED A0 BD ED B8 8A This is not true UTF-8, but CESU-8 — _C_ompatibility-_E_ncoded _S_urrogates to _U_TF-_8_. It also allowed, but other Unicode-aware editors, like built-in Far Manager, will produce true UTF-8 — one 4-byte sequence, starting with $Fx: F0 9F 98 8A PSPad cannot decode that file. Because Windows XP and newer have full support of UTF-16, the problem is in Delphi itself: function Utf8Decode(const Source: UTF8String): WideString; // my function var L: Integer; Dest: WideString; begin L := Length(Source); SetLength(Dest, L); SetLength(Dest, MultiByteToWideChar(CP_UTF8, 0, Pointer(Source), L, Pointer(Dest), L)); Result := Dest; end; function Utf8Encode(const Source: WideString): UTF8String; // my function var L: Integer; Dest: UTF8String; begin L := Length(Source); SetLength(Dest, L * 3); SetLength(Dest, WideCharToMultiByte(CP_UTF8, 0, Pointer(Source), L, Pointer(Dest), Length(Dest), nil, nil)); Result := Dest; end; procedure TMainForm.Button1Click(Sender: TObject); const Smile = #$F0#$9F#$98#$8A; // 4-byte UTF-8 begin MessageBoxW(Handle, Pointer(Utf8Decode(Smile)), nil, 0); // one char, good MessageBoxW(Handle, Pointer(System.Utf8Decode(Smile)), nil, 0); // null string, wrong end; Being compiled under Delphi XE2, Delphi built-in function will also produce a surrogate pair, because Embarcadero fixed System.Utf8Decode function in Unicode-aware Delphi. I was a little wrong about UTF-8 in Windows XP: both surrogates and 4-byte UTF-8 work fine, but we should pass zero as Flags parameter to MultiByteToWideChar function, as wrote in MSDN. -- <http://forum.pspad.com/read.php?4,45099,64630> PSPad freeware editor http://www.pspad.com
