Dear Friends i'm trying to build ar-en system. i have downloaded the arabic-english // corpora from http://www.euromatrixplus.eu/multi-un/ at first moses tokenizer do not include arabic language so i did it with english the second problem is that the corpus is in xml format.So english(also arabic)texts after the tokenization are in this format because of the tags of XML
< p n = " 2 " > < s n = " 2 " > Agenda item 116 < / s > < / p > so what should i do??? would you help me please i'm stuck at this point thank you for your help
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
