Dear Friends
i'm trying to build ar-en system. i have downloaded the arabic-english //
corpora from http://www.euromatrixplus.eu/multi-un/
at first moses tokenizer do not include arabic language so i did it with
english
the second problem is that the corpus is in xml format.So english(also
arabic)texts after the tokenization are in this format because of the tags
of XML


< p n = " 2 " >
< s n = " 2 " > Agenda item 116 < / s >
< / p >

so what should i do??? would you help me please i'm stuck at this point
thank you for your help
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to