http://bugzilla.novell.com/show_bug.cgi?id=480178
http://bugzilla.novell.com/show_bug.cgi?id=480178#c31 Damien Diederen <[email protected]> changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #359241|0 |1 is obsolete| | --- Comment #31 from Damien Diederen <[email protected]> 2010-05-14 20:56:26 UTC --- Created an attachment (id=362384) --> (http://bugzilla.novell.com/attachment.cgi?id=362384) System.Char: Handle astral planes in GetUnicodeCategory(string,int) If the string element at index starts a surrogate pair, we decode the full codepoint and "query" the higher planes of the database. This commit fixes #480178 COMPATIBILITY The updated Mono runtime as been verified to produce the same results as Microsoft's; here are MD5 sums of their Unicode category database dumps: eba45e00acdc82f9a08873465110aef4 v2.0.50727.dump eba45e00acdc82f9a08873465110aef4 v3.5.21022.dump 56fd5c828fbb9083693835680667fd2c v4.0.30319.dump eba45e00acdc82f9a08873465110aef4 gmcs.dump 56fd5c828fbb9083693835680667fd2c dmcs.dump (Generated via create-category-table --dump, compiled and executed under the relevant runtime.) PERFORMANCE The simple data access pattern, suggested by Paolo Molaro, is fairly efficient; here are timings observed on a simple loop fetching the category code of each codepoint from "Range" "Iterations" times (Intel(R) Core(TM)2 Duo CPU P7350 @ 2.00GHz; best of three runs): | Range | Iterations | Linear table | 2.0+ | 4.0 | |-------------+------------+--------------+-------+-------| | 0000-00FF | 256000 | 0.30s | 0.35s | 0.37s | | 0000-FFFF | 16000 | 4.75s | 5.54s | 5.82s | | 0000-10FFFF | 1000 | N/A | 5.63s | 6.19s | |-------------+------------+--------------+-------+-------| | Data size | | 64kB | 30kB | 48kB | In the table above, 2.0+ denotes a mode compatible with versions v2.0.50727...v3.5.21022 of Microsoft's framework, whereas 4.0 mimics v4.0.30319. The former is used by programs compiled by 'mcs', 'gmcs' and 'smcs'; the latter by programs compiled by 'dmcs'. (The difference in performance between these modes is probably due to a change in memory access patterns: the 4.0 table shares "pages" with the 2.0 one, causing accesses to be more spread out.) -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug. You are the assignee for the bug. _______________________________________________ mono-bugs maillist - [email protected] http://lists.ximian.com/mailman/listinfo/mono-bugs
