Hi list, I'm diving into properly dealing with UTF8 encoded data coming in from my Twitter application/library (see FPC list announcement yesterday).
Decided to go over the Laz wiki page on unicode and cleaned up some typos etc: http://wiki.lazarus.freepascal.org/LCL_Unicode_Support I've got some questions: - Widestrings and Ansistrings section What is a widestring? An UTF-16 encoded string? Or a string that can hold multiple bytes that is further undefined? -Searching a substring section Searching UTF 8 substrings has this code uses lazutf8; ... BytePos:=Pos(SearchFor,aText); CharacterPos:=UTF8Length(PChar(aText),BytePos-1); writeln('The substring "',SearchFor,'" is in the text "',aText,'"', ' at byte position ',BytePos,' and at character position ',CharacterPos); It says: "Due to the special nature of UTF8 you can simply use the normal string functions" => is the special nature the fact that there is only a single way to decode UTF8 characters because multi-byte encodings have different high bits set (as explained in the description of UTF8 on the bottom of the page) - No Unicode support on Win9x section "Windows platforms <=Win9x [..] only partially support Unicode" however: "Win 9x and NT offer two parallel sets of API functions:[..]the new, Unicode enabled *W." Presumably not all *A functions are available as *W functions on Win9x, which is why Unicode is not fully supported? (I can also imagine the default fonts etc do not support showing Unicode characters) Thanks for any clarification; I'll update the wiki... Thanks, Reinier -- _______________________________________________ Lazarus mailing list [email protected] http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
