On 2013-04-18 09:31, Martin Schreiber wrote: > > It counts the number of a known constant Russian character in a random > string. > In utf-16 and UCS4 this is an operation with numbers and string index,
Does that Russian character have a surrogate pair? Remember that you can't use mystring[i] even for UTF-16 encoded text, because that will *not* check if you are working with the first or second part of a surrogate pair. The index reference idea might work for USC2 (like in MSEgui), which only supports 2-byte characters, but it is not true for UTF-16 (correctly implemented). If Ivanko can share the original text and test code, I would like to test myself. I also bet that they use indexed references into the text, and disregard surrogate pairs (thus not fully supporting UTF-16) - like most programmers do. Regards, - Graeme - -- fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal http://fpgui.sourceforge.net/ ------------------------------------------------------------------------------ Precog is a next-generation analytics platform capable of advanced analytics on semi-structured data. The platform includes APIs for building apps and a phenomenal toolset for data science. Developers can use our toolset for easy data analysis & visualization. Get a free account! http://www2.precog.com/precogplatform/slashdotnewsletter _______________________________________________ mseide-msegui-talk mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/mseide-msegui-talk

