Hello, I am really anxious for help on setting up an arabic-english Moses system. First, I installed the United Nations arabic english corpora found on: http://www.euromatrixplus.net/multi-un/ Then I tried to tokenize the arabic just as I did while following the Moses tutorial with the French-English corpora. I have a couple of questions: a. Since Moses doesn't have "ar" as a language, what can I do to solve this problem while tokenizing? The error is as follows: ~/mosesdecoder/scripts/tokenizer/tokenizer.perl -l ar < ~/corpus/training/xml/ar/2009/S_PV6164-ar.xml > ~/corpus/S_PV6164-ar.tok.xml Tokenizer Version 1.1 Language: ar Number of threads: 1 WARNING: No known abbreviations for language 'ar', attempting fall-back to English version...
b. Can anyone who have used MADA+TOKAN help me out cause it seems impossible for me to understand its tutorial: http://www1.ccls.columbia.edu/MADA/CCLS-12-01.pdf Thank you!
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
