On 2013-04-18 09:31, Martin Schreiber wrote:
>
> It counts the number of a known constant Russian character in a random 
> string. 
> In utf-16 and UCS4 this is an operation with numbers and string index,

Does that Russian character have a surrogate pair? Remember that you
can't use mystring[i] even for UTF-16 encoded text, because that will
*not* check if you are working with the first or second part of a
surrogate pair. The index reference idea might work for USC2 (like in
MSEgui), which only supports 2-byte characters, but it is not true for
UTF-16 (correctly implemented).

If Ivanko can share the original text and test code, I would like to
test myself. I also bet that they use indexed references into the text,
and disregard surrogate pairs (thus not fully supporting UTF-16) - like
most programmers do.


Regards,
  - Graeme -

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/


------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
mseide-msegui-talk mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mseide-msegui-talk

Reply via email to