Christian Boitet
Sat, 12 Sep 2009 10:31:16 -0700
Hi Harry, 12/9/09 At 22:51 +0100 6/09/09, some...@cs.man.ac.uk wrote:
It occurred to me that there are some other interesting special cases around this problem, not relating to scripts, but where alternative orthographies exist side-by-side. English (US vs British vs some other hybrids) and German (old vs new vs Swiss) would be two examples; Hebrew spelling is (I believe) not 100% standardized, especially for proper names; in Japanese there are often equally valid alternatives for writing certain words (kanji vs hiragana vs mixed, eg "oshidashi" could be wriiten with two kanji, or K-H-K-H, or all in hiragana). I know that in some cases there are more differences than just spelling variants, but it seems to me that the BrE vs AmE question (for a translator, especially an online MT system) is not unlike the problem for Chinese (mainland or Taiwanese?). ... in each case thewhich do you offer as default, and what if the user wants the other one? On Christian's examples, I have heard it said that Hindi and Urdu have grown apart to an extent where they can become mutually unintelligible
Your info seems not OK. As I wrote, they all understand Bollywood. That information (again) comes from Abbas Malik, a Pakistani PhD student here who has already done quite a lot on translitteration (see his papers at COLING-06 and COLING-08, for example).
Also, I stressed in my e-mail that J.Halpern (cjk.org) would be the most precise and exhastive source of information concerning scripts and translitteration and transcriptions for and between Chinese, Japanese, Korean, English (plus a lot on Arabic).
(though you're never sure if they really are, or it's just a political issue), and you might expect Serbian and Croatian to do likewise (remember 15 years ago when Yugoslavia's language was said to be Serbocroat?)
Consult P.Pognan (Prof. of Czech at INaLCO, Paris, a long time researcher in NLP) on relations between all slavic languages. He has studied almost all of them, including sorabe.
Best,
Xan
--
-------------------------------------------------------------------------
Christian Boitet
(Pr. Universite' Joseph Fourier)
Groupe d'Etude pour la Traduction Automatique
et le Traitement Automatisé des Langues et de la Parole
G E T A L P
GETALP, LIG-campus, BP 53 (ex: GETA, CLIPS, IMAG-campus)
Tel: +33 (0)4 76 51 43 55 / 51 48 17 Fax: +33 (0)4 76 63 56 86
385, rue de la Bibliothe`que Mel: christian.boi...@imag.fr
38041 Grenoble Cedex 9, France
_______________________________________________ Mt-list mailing list