Re: [Moses-support] Data for building a factored model

Sašo Kuntaric Fri, 06 May 2016 11:34:00 -0700

Hi all,

Thank you Philipp for all the useful info, I will take a closer look at the
mentioned scripts.


I do have one follow-up question. Like I said, I really enjoyed working
with the factored corpora in the example. How were those created? Is there
a tool I can use to create similar ones?

Best regards,

Sašo

2016-05-06 0:08 GMT+02:00 Philipp Koehn <[email protected]>:

> Hi,
>
> life is easier with factored models, if you use the experiment.perl set-up,
> where you just have to specify the factor set-up and scripts that generate
> factors.
>
> These scripts take the tokenized text and replace each word with a factor
> (e.g., replace each word with the POS tag).
>
> The POS LM is trained on such a corpus - each word is replaced by a
> POS tag, and then the standard LM training process is run over it.
>
> See $MOSES/scripts/ems/example/config.factored for an example.
>
> -phi
>
> On Wed, May 4, 2016 at 3:30 PM, Sašo Kuntaric <[email protected]>
> wrote:
> > Hello again,
> >
> > I believe I can wrap my head around the theoretical part, but the English
> > and German corpora in the Moses factored model tutorial
> > (http://www.statmt.org/moses/?n=Moses.FactoredTutorial) look beautifully
> > factored, so my question is how were the original corpora processed? Was
> a
> > specific tagger used and was there any manual/script postprocessing done?
> >
> > And since I am already bugging everyone, how is the language model pos.lm
> > created? Is it extracted from a file, created manually or in another way?
> >
> > Thank you in advance for all the replies.
> >
> > Best regards,
> >
> > Sašo
> >
> > 2016-05-02 19:45 GMT+02:00 Marwa Refaie <[email protected]>:
> >>
> >> Corpus for translation model should be on 2 parallel files in the format
> >> Word | pos | Lema .... For example , by a file for each language. You
> can
> >> prepare files using word net , Stanford , or any tagger & stemmer  as
> can
> >> deal with your language pairs. May be before enter the files to moses
> you
> >> should adjust the text files by a python script (write it your self)
> >>
> >> For language model ... You must build it as follows
> >> Verb noun noun
> >> Noun Det adj
> >> ....... Depending on the target language only ,, Then build it as usual
> >> n-gram lm.
> >>
> >> Sent from my iPad
> >>
> >> > On May 2, 2016, at 10:11, Sašo Kuntaric <[email protected]>
> wrote:
> >> >
> >> > Hi all,
> >> >
> >> > I am having some issues producing the corpora in the correct format
> for
> >> > Moses to execute factored training.
> >> >
> >> > I am looking at the factored tutorial on the Moses website and I am
> >> > wondering, how to get such consistent corpora for two languages. What
> tools
> >> > are being used and can they be trained for specific languages
> (Slovenian in
> >> > my example). Are such tools available for download or is such data
> produced
> >> > with custom scripts?
> >> >
> >> > --
> >> > Best regards,
> >> >
> >> > Sašo
> >> > _______________________________________________
> >> > Moses-support mailing list
> >> > [email protected]
> >> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> >
> >
> >
> > --
> > lp,
> >
> > Sašo
> >
> > _______________________________________________
> > Moses-support mailing list
> > [email protected]
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >
>



-- 
lp,

Sašo

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Data for building a factored model

Reply via email to