Hi all, Thank you Philipp for all the useful info, I will take a closer look at the mentioned scripts.
I do have one follow-up question. Like I said, I really enjoyed working with the factored corpora in the example. How were those created? Is there a tool I can use to create similar ones? Best regards, Sašo 2016-05-06 0:08 GMT+02:00 Philipp Koehn <[email protected]>: > Hi, > > life is easier with factored models, if you use the experiment.perl set-up, > where you just have to specify the factor set-up and scripts that generate > factors. > > These scripts take the tokenized text and replace each word with a factor > (e.g., replace each word with the POS tag). > > The POS LM is trained on such a corpus - each word is replaced by a > POS tag, and then the standard LM training process is run over it. > > See $MOSES/scripts/ems/example/config.factored for an example. > > -phi > > On Wed, May 4, 2016 at 3:30 PM, Sašo Kuntaric <[email protected]> > wrote: > > Hello again, > > > > I believe I can wrap my head around the theoretical part, but the English > > and German corpora in the Moses factored model tutorial > > (http://www.statmt.org/moses/?n=Moses.FactoredTutorial) look beautifully > > factored, so my question is how were the original corpora processed? Was > a > > specific tagger used and was there any manual/script postprocessing done? > > > > And since I am already bugging everyone, how is the language model pos.lm > > created? Is it extracted from a file, created manually or in another way? > > > > Thank you in advance for all the replies. > > > > Best regards, > > > > Sašo > > > > 2016-05-02 19:45 GMT+02:00 Marwa Refaie <[email protected]>: > >> > >> Corpus for translation model should be on 2 parallel files in the format > >> Word | pos | Lema .... For example , by a file for each language. You > can > >> prepare files using word net , Stanford , or any tagger & stemmer as > can > >> deal with your language pairs. May be before enter the files to moses > you > >> should adjust the text files by a python script (write it your self) > >> > >> For language model ... You must build it as follows > >> Verb noun noun > >> Noun Det adj > >> ....... Depending on the target language only ,, Then build it as usual > >> n-gram lm. > >> > >> Sent from my iPad > >> > >> > On May 2, 2016, at 10:11, Sašo Kuntaric <[email protected]> > wrote: > >> > > >> > Hi all, > >> > > >> > I am having some issues producing the corpora in the correct format > for > >> > Moses to execute factored training. > >> > > >> > I am looking at the factored tutorial on the Moses website and I am > >> > wondering, how to get such consistent corpora for two languages. What > tools > >> > are being used and can they be trained for specific languages > (Slovenian in > >> > my example). Are such tools available for download or is such data > produced > >> > with custom scripts? > >> > > >> > -- > >> > Best regards, > >> > > >> > Sašo > >> > _______________________________________________ > >> > Moses-support mailing list > >> > [email protected] > >> > http://mailman.mit.edu/mailman/listinfo/moses-support > > > > > > > > > > -- > > lp, > > > > Sašo > > > > _______________________________________________ > > Moses-support mailing list > > [email protected] > > http://mailman.mit.edu/mailman/listinfo/moses-support > > > -- lp, Sašo
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
