I should preface this by saying that I don't know Mandarin, so I'm working rather blind. I want to make sure that a list of filenames/track names/artists/albums is sorted correctly for Chinese users. My understanding is that the most commonly expected sort order is based on the pinyin transcription of the characters.
I've been investigating how strings are sorted in various cultures in .NET, and I've found that I get different results in Mono from .NET for the "zh-Hans" culture. From what I've read, I think this should just be another name for the "zh-CHS" culture, and I should get the same results for both, but Mono gives me different results. Here's a link to my short test program: http://pastebin.com/kTL9QuLS Here's my output on .NET: http://pastebin.com/D5Hp6GjA On .NET, in both the zh-Hans and the zh-CHS culture, the example strings are sorted in an order consistent with their pinyin transcriptions, which is what I expect. Here's my output on mono 3.0.10, running on Ubuntu: http://pastebin.com/jMB0FdkP This time, I get the same result as for .NET with zh-CHS. However, for zh-Hans, I get a different order. It *looks* like they're just being ordered by unicode code-point. I am surprised that I see a different sort order for zh-CHS from zh-Hans on the same setup, and I'm surprised at the difference from .NET. I tried another attempt with an older version of Mono, 2.10.8, as distributed with Ubuntu: http://pastebin.com/BmEepAmc This gives me the expected sort order for both zh-Hans and zh-CHS, but it also reports the culture name as being simply "Chinese" in each case, instead of the expected "Chinese (Simplified) Legacy" and "Chinese (Simplified)". Finally, I've summarized the results in a table: Runtime Requested culture Culture display name Collation order for Chinese characters .NET 4.0 invariant Invariant Language (Invariant Country) code-point .NET 4.0 zh-CHS Chinese (Simplified) Legacy pinyin .NET 4.0 zh-Hans Chinese (Simplified) pinyin Mono 2.8.10 invariant Invariant Language (Invariant Country) code-point Mono 2.8.10 zh-CHS Chinese pinyin Mono 2.8.10 zh-Hans Chinese pinyin Mono 3.0.10 invariant Invariant Language (Invariant Country) code-point Mono 3.0.10 zh-CHS Chinese (Simplified) Legacy pinyin Mono 3.0.10 zh-Hans Chinese (Simplified) code-point(?!) (In case the formatting is screwed up in email, here it is monospaced: http://pastebin.com/SXYR7ucc ) If you're still following me (thankyou!) I have a few questions: 1. Am I correct to expect that zh-CHS and zh-Hans should have the same collation behaviour as each other? 2. Am I correct to expect that zh-Hans will have a pinyin-based collation order? 3. What systems/libraries are involved here? Does Mono depend on some system library for its collation order, or does it implement this itself? Are there particular configuration options I need to be aware of if I am compiling mono myself? 4. How does Mono pick the default culture on its various platforms? Will it ever pick 'zh-Hans' as the default culture? Or would it always prefer 'zh-CHS'? I'm worried that if it defaults to 'zh-Hans' for some Chinese users they will get a surprising and unhelpful sort order. Regards, Weeble. _______________________________________________ Mono-list maillist - [email protected] http://lists.ximian.com/mailman/listinfo/mono-list
