Hello,
I am really anxious for help on setting up an arabic-english Moses system.
First, I installed the United Nations arabic english corpora found on:
http://www.euromatrixplus.net/multi-un/
Then I tried to tokenize the arabic just as I did while following the Moses
tutorial with the French-English corpora.
I have a couple of questions:
a. Since Moses doesn't have "ar" as a language, what can I do to solve this
problem while tokenizing?
The error is as follows:
 ~/mosesdecoder/scripts/tokenizer/tokenizer.perl -l ar <
~/corpus/training/xml/ar/2009/S_PV6164-ar.xml >
 ~/corpus/S_PV6164-ar.tok.xml
Tokenizer Version 1.1
Language: ar
Number of threads: 1
WARNING: No known abbreviations for language 'ar', attempting fall-back to
English version...

b. Can anyone who have used MADA+TOKAN help me out cause it seems
impossible for me to understand its tutorial:
http://www1.ccls.columbia.edu/MADA/CCLS-12-01.pdf


Thank you!
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to