On Sun, Mar 13, 2005 at 05:52:09PM -0500, Morten Welinder wrote: > We simply use what we have from glib; we don't do our own collating.
All right. Pass my comment on to them then (or do you want me to do it myself)? > The situation is much more complicated than you might be realize (even > though if you read Knuth you have seen it all). This is not a reason to do nothing! (rant: That's a very annoying thing with many University-bred computer scientists: they would mention that in theory, such problem is NP-complete (even though in all practical cases it is easy), or find a super exception. Therefore they cannot be 100% right (but only, maybe, 100-epsilon% right), and they do nothing. They seem to not understand real life and the fact that "worse is better".) > For example, \oe -> oe would be wrong in Danish. Right. I don't speak Danish and there are many languages out there with problems I cannot even imagine. Apparently, Russians can't sort words between latin and cyrillic: all words have to be written in the same alphabet. There is probably no country/language in the world with sorting rules able to apply to several alphabets at the same time. Anyway, it is a country/Academy of language-specific issue. I can't think of a language where "�" should not be considered as "e" *in first approximation*. Same for �, �, �, ... Start with the simple cases... Latin1 and its simple letters cover most of the uses. The Spanish have the two words "que" and "qu�". I don't know which is supposed to be first (and probably nearly no Spanish person would know that--I wouldn't know in French unless I open a dictionary). But even if you get it wrong, it is much better to have them next to each other that to push the "�" in the end of the alphabet. The Spanish used to have the special letters "ll" and "ch". Now those just behave like they would in English with respect to sorting and alphabet. For the complex cases, I guess you may rely on the locales to know which language the stylesheet is written in, and then contact competent native speakers to know the rules of their language. If a stylesheet is written in several languages at the same time then 1/ this is a special case, very rare 2/ the user should not expect sorting to work 3/ maybe Gnumeric can use a "language" attribute for such or such field I doubt this issue is nowhere in the Unicode sites and FAQs. Maybe that would be the right place to port the debate. If/when Gnumeric does Chinese, Arabic, and the like, how are you (or the glib people) going to deal with this? _______________________________________________ gnumeric-list mailing list [email protected] http://mail.gnome.org/mailman/listinfo/gnumeric-list
