Hi Harry, 12/9/09
At 22:51 +0100 6/09/09, [email protected] wrote:
It occurred to me that there are some other interesting special cases
around this problem, not relating to scripts, but where alternative
orthographies exist side-by-side. English (US vs British vs some other
hybrids) and German (old vs new vs Swiss) would be two examples; Hebrew
spelling is (I believe) not 100% standardized, especially for proper
names; in Japanese there are often equally valid alternatives for writing
certain words (kanji vs hiragana vs mixed, eg "oshidashi" could be wriiten
with two kanji, or K-H-K-H, or all in hiragana).
I know that in some cases there are more differences than just spelling
variants, but it seems to me that the BrE vs AmE question (for a
translator, especially an online MT system) is not unlike the problem for
Chinese (mainland or Taiwanese?). ... in each case thewhich do you offer
as default, and what if the user wants the other one?
On Christian's examples, I have heard it said that Hindi and Urdu have
grown apart to an extent where they can become mutually unintelligible
Your info seems not OK. As I wrote, they all
understand Bollywood. That information (again)
comes from Abbas Malik, a Pakistani PhD student
here who has already done quite a lot on
translitteration (see his papers at COLING-06 and
COLING-08, for example).
Also, I stressed in my e-mail that J.Halpern
(cjk.org) would be the most precise and exhastive
source of information concerning scripts and
translitteration and transcriptions for and
between Chinese, Japanese, Korean, English (plus
a lot on Arabic).
(though you're never sure if they really are, or it's just a political
issue), and you might expect Serbian and Croatian to do likewise (remember
15 years ago when Yugoslavia's language was said to be Serbocroat?)
Consult P.Pognan (Prof. of Czech at INaLCO,
Paris, a long time researcher in NLP) on
relations between all slavic languages. He has
studied almost all of them, including sorabe.
Best,
Xan
--
-------------------------------------------------------------------------
Christian Boitet
(Pr. Universite' Joseph Fourier)
Groupe d'Etude pour la Traduction Automatique
et le Traitement Automatisé des Langues et de la Parole
G E T A L P
GETALP, LIG-campus, BP 53 (ex: GETA, CLIPS, IMAG-campus)
Tel: +33 (0)4 76 51 43 55 / 51 48 17 Fax: +33 (0)4 76 63 56 86
385, rue de la Bibliothe`que Mel: [email protected]
38041 Grenoble Cedex 9, France
_______________________________________________
Mt-list mailing list