Re: [Moses-support] Understanding moses.ini

Pierre Lison Fri, 04 Mar 2016 05:43:05 -0800

Hi Bogdan,

Here are some answers:


> 1. What do the four weights under TranslationModel0 correspond to?
> Reading "Tuning for Quality" in
> http://www.statmt.org/moses/?n=Moses.Tutorial suggests they are the
> weights of the four different components of the translation, in this
> order: (1) phrase translation, (2) language model, (3) reordering
> model, (4) word penalty. Is this true?

No, the four scores are the inverse phrase translation probability \phi(f|e), 
inverse lexical weighting lex(f|e), direct phrase translation probability 
\phi(e|f) and direct lexical weighting lex(e|f) -- see 
http://www.statmt.org/moses/?n=FactoredTraining.ScorePhrases.
> 
> 2. What is the range of these four weights?

The weights can be positive or negative. If you want to, you can provide a 
bound for the weights during MERT tuning, but other than that, I don't think 
there is a specific lower or higher bound. 
> 
> 3. Will setting one of them to 0 disable that component in the model?

If you set one weight to zero, its model won't be able to influence the scoring 
process during decoding, but the feature function will still be called. If you 
want to truly disable the model, you should uncomment both the feature function 
and its weight.

> 4. Why is there a separate weight LM0= 0.5 if there is already a
> weight for the language model under TranslationModel0?

See above (the weights in TranslationModel0 do not include the language model).
> 
> 
> 5. Why is there a separate weight WordPenalty0= -1 if there is already
> a weight for word penalty under TranslationModel0?

See above.
> 
> Is it the case that WordPenalty0 describes how the word penalty
> component should behave (favoring longer/shorter output), and the
> weight for word penalty under TranslationModel0 describes how
> important this component should be, relative to the other three?

> 5.1. If so, what does the separate weight LM0= 0.5 mean?

See above.

> 
> 5.2. "Model" in http://www.statmt.org/moses/?n=Moses.Background says
> that "Usually, this factor [ω (called word cost)] is larger than 1,
> biasing toward longer output."
> "Tuning for Quality" in http://www.statmt.org/moses/?n=Moses.Tutorial
> says that "Negative values for the word penalty favor longer output,
> positive values favor shorter output."
> Which one is it?

The two sections are talking about different models. The background text 
describes the traditional, "noisy-channel" model of machine translation, where 
the models are not assigned an explicit weight. But Moses (and all SMT systems) 
actually relies on a log-linear model, where the total score for a translation 
is decomposed of a log-linear combination of models, each model having a 
particular weight. 

The WordPenalty0 refers to the weight of the log-linear model, not the "word 
cost" of a noisy-channel. 

> 
> 6. How do the different parameters described under "Model" in
> http://www.statmt.org/moses/?n=Moses.Background relate to the weights
> in moses.ini?
> - "We use a simple distortion model d(starti,endi-1) =
> α|starti-endi-1-1| with an appropriate value for the parameter α" -> α
> looks like the weight (3) reordering model
> - "In order to calibrate the output length, we introduce a factor ω
> (called word cost) for each generated English word" -> ω looks like
> the weight (4) word penalty
> - Which ones are the others?

See above - the background section only describe a heavily simplified model 
with no model weighting, not the actual log-linear model. 
> 
> 7. What is the range for distortion-limit? Will setting
> distortion-limit= 0 ensure that the output has the same length as the
> input?

No, setting the distortion limit to zero simply means there will be no 
reordering between the phrases, but the phrase pairs themselves may have 
different lengths. For instance, the English phrase "home" (1 words) might be 
mapped to "à la maison" (3 words) in French.

> 
> 8. What does PhrasePenalty0= 0.2 do? I thought PhrasePenalty0 is
> always exp(1), which is not the same as 0.2

The initial parameter value 0.2 (as all parameters in the initial moses.ini 
file) is just an arbitrary value. Remember that all these parameters *must* be 
tuned before you actually start testing the system. Otherwise you will just get 
very poor translation results. 
> 
> 9. What does Distortion0= 0.3 do? There is already a weight (3)
> reordering model under TranslationModel0

Again, 0.3 is just an arbitrary value set as the weight for the distortion 
model. The actual weight will be learned via tuning.
> 
> 10. Would setting UnknownWordPenalty0= 1 ensure that no penalty is
> incurred for copying unknown words verbatim to the output?

I'm not exactly sure about how the UnknownWordPenalty feature works. 
> 
> 
> Finally, where are all these parameters documented? I couldn't find
> any definitive source. The decoder parameter reference at
> http://www.statmt.org/moses/?n=Moses.DecoderParameters didn't
> illuminate me enough.

The description of the various models (translation models, language models, 
etc.) is spread over the Moses website. You can also find additional 
explanations in Philipp Koehn's book on SMT.

> 
> Many, many thanks in advance.
> Bogdan
> 
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support

--
Pierre Lison (Postdoctoral Research Fellow)
Department of Informatics, University of Oslo
Mobile: +47.967.998.12
Web: http://folk.uio.no/plison


_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Understanding moses.ini

Reply via email to