Corpus for translation model should be on 2 parallel files in the format Word | pos | Lema .... For example , by a file for each language. You can prepare files using word net , Stanford , or any tagger & stemmer as can deal with your language pairs. May be before enter the files to moses you should adjust the text files by a python script (write it your self)
For language model ... You must build it as follows Verb noun noun Noun Det adj ....... Depending on the target language only ,, Then build it as usual n-gram lm. Sent from my iPad > On May 2, 2016, at 10:11, Sašo Kuntaric <[email protected]> wrote: > > Hi all, > > I am having some issues producing the corpora in the correct format for Moses > to execute factored training. > > I am looking at the factored tutorial on the Moses website and I am > wondering, how to get such consistent corpora for two languages. What tools > are being used and can they be trained for specific languages (Slovenian in > my example). Are such tools available for download or is such data produced > with custom scripts? > > -- > Best regards, > > Sašo > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
