On Sunday 14 April 2013 08:54:31 Ivanko B wrote:
> I don't think you would like I would
> make an "instant switch" to that. ;-)
> ================
> Then UTF16 with surrogate paiers ? Won't it have significant
> performance issues like with UTF8 ? Some performance testing (by
> testcases written by FreepascalRu guys) has revealed that UTF8 is
> times or even orders as slower than UCS2.
>
I don't know if it is useful if I repeat again because it seems nowbody reads 
or believes it. Anyway, once again.
utf16 surrogate pair code points have no common code units with UCS2 -> there 
is no problem to have surrogate pairs in a UCS2 string as long the pair is 
not split in the middle. If one searches for a character constant in a string 
in utf-16 by character index one knows that it is a single 16 bit code unit 
if the searched character is in BMP (Basic Multilingual Plane). Searching for 
a substring with surrogate pairs works without further measures. What is more 
difficult is to determine the count of glyphs in a string. That is difficult 
in UCS4 too because there could be decomposed characters in the string.

Martin

------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
mseide-msegui-talk mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mseide-msegui-talk

Reply via email to