Corpus for translation model should be on 2 parallel files in the format
Word | pos | Lema .... For example , by a file for each language. You can 
prepare files using word net , Stanford , or any tagger & stemmer  as can deal 
with your language pairs. May be before enter the files to moses you should 
adjust the text files by a python script (write it your self) 

For language model ... You must build it as follows
Verb noun noun
Noun Det adj
....... Depending on the target language only ,, Then build it as usual n-gram 
lm.

Sent from my iPad

> On May 2, 2016, at 10:11, Sašo Kuntaric <[email protected]> wrote:
> 
> Hi all,
> 
> I am having some issues producing the corpora in the correct format for Moses 
> to execute factored training.
> 
> I am looking at the factored tutorial on the Moses website and I am 
> wondering, how to get such consistent corpora for two languages. What tools 
> are being used and can they be trained for specific languages (Slovenian in 
> my example). Are such tools available for download or is such data produced 
> with custom scripts?
> 
> -- 
> Best regards,
> 
> Sašo
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to