On Sat, 6 Jul 2002, C Bobroff wrote: > BTW, I can understand the heh-waw reversal and the 4 extra Persian letters > being dumped at the end, but please tell me, why is the "kaf" out of > order?
The point is: The difference is not only 4 letters, but 6. The codes for Persian Kaf and Yeh (called "Keheh" and "Farsi Yeh" in Unicode) are also different from their Arabic friends. They also appear at the end. Actually, these are not the only things needed to get proper Persian sorting. You should also think about getting Teh Marbuta sorted with Heh. Also, all Hamza forms (Hamza, Alef With Hamza Above, Alef With Hamza Below, Waw With Hamza Above, Yeh With Hamza Above) should sort equally at the first level, between Alef and Beh: if two different words became equal then, like "mo'men" (Meem, Waw-Hamza, Meem, Noon) and "ma'man" (Meem, Alef-Hamza, Meem, Noon), you should now consider their difference. It will get more complicated if you consider Fatha, Kasra, etc, and then punctuation, but let's forget now for the moment. Proper sorting is considered a four-level at minimum process, even with English text. But that may not be enough, and some sophisticated preprocessing is also needed: I remember an exercise from Knuth's The Art of Computer Programming, asking to implement how librarians do sorting: They sort "2001: A Space Odyssey" in "T", for a start. roozbeh _______________________________________________ FarsiWeb mailing list [EMAIL PROTECTED] http://lists.sharif.edu/mailman/listinfo/farsiweb
