Re: [Moses-support] MERT's Powell Search

liling tan Fri, 11 Dec 2015 07:49:31 -0800

Dear Adam and Moses devs/users,

@Adam, Thank you for the explanation on the line 6 of the pseudo code. I
understand it better now.

I have a few more short questions about the pseudo code for the powell
search on slide 37 of http://mt-class.org/jhu/slides/lecture-tuning.pdf,

On line 6 does the "score" in "compute line l: parameter value -> score"
refer to (i) the MT evaluation metric score  (e.g. BLEU) between the
translation and the reference sentence or (ii) nbest list weighted overall
score as we see in the last column of a moses generated nbest list (e.g.
http://www.statmt.org/moses/?n=Advanced.Search)?

If it is BLEU, is it true that these sentence-level scores of the nbest
list can be pre-calculated before getting into the powell search?

At line 8 of the pseudo code, when it asks to "find line l with steepest
descent", is it looking for each sentence find the (i) line with the
highest λj or (i) the line with the highest g(ei|f).

If line 8 of the pseudo code is to find the line with the highest λk, when
"computing the line l: parameter value -> score", we should also do
something like this ( [g(ei|f) - ∑k∈ 1,...,j-1,j+1,...,|λ| λkhk(ei,f) ] /
hj(ei,f) ) to find the line with the highest λj right?

Then at line 15 of the pseudo code, it says "compute score for value before
first threshold point". Is this "score" different from the "score" at line
6? At line 6, it's a sentence-level score (which I hope it means BLEU and
not the weighted overall score), and at line 15, it seems to be computing
the corpus-level score given the initial parameter values.

If at line 15, it is computing the corpus level score, is it only taking
the best score of the n translations for each reference? And if this is
BLEU, it's doing not a simple case of averaging sentence-level BLEU which
might be kept from line 6, is that right? If it is BLEU, then this score
could be pre-computed before the powell search too, right?

I'm sorry for the many questions and request for clarification. Thanks in
advance for the tips and answers!

Regards,
Liling

On Thu, Dec 10, 2015 at 10:29 AM, Adam Lopez <alo...@inf.ed.ac.uk> wrote:

> Hi Liling –
>
>
>> We are going through the slides for MT tuning on
>> http://mt-class.org/jhu/slides/lecture-tuning.pdf and we couldn't figure
>> out what does "λai + bi" on slide 31 refer to.
>>
>> What are the values for "ai" and "bi"? Are they numbers from the nbest
>> list?
>>
>
> [For clarity, I'm going to change the notation slightly here: slide 23
> uses λ to refer to the parameter vector (indexed by i), while slide 31 uses
> it to refer to a single parameter (i.e. an element of this vector). This is
> confusing. Let's use λ as the parameter vector, |λ| as its length, and λj as
> its j-th element, which is what we're optimizing in slides 31-36 (since i
> is already used on slide 31 to index elements of the n-best list, λi would
> be confusing). I'm also going to use g(ei|f) rather than p(ei|f) since
> this is just a linear model; we aren't doing probabilistic inference here.]
>
> We're going to compute g(ei|f) as a function of a single parameter λj
> while holding all other parameters fixed. This is just:
>
> g(ei|f) = ∑k∈ 1,...,|λ| λkhk(ei,f) = λjhj(ei,f) + ∑k∈
> 1,...,j-1,j+1,...,|λ| λkhk(ei,f)
>
> Hence ai = hj(ei,f) and bi = ∑k∈ 1,...,j-1,j+1,...,|λ| λkhk(ei,f). In
> other words, ai is just the value of the j-th feature on the i-th element
> of the n-best list, and bi is the model score according to all other
> features and weights.
>
> According to the algorithm on slide 37 of
>> http://mt-class.org/jhu/slides/lecture-tuning.pdf, is line 6 where
>> the λai + bi computation occurs?
>>
>> compute line l: parameter value → score
>>
>>
> Yes.
>
> From the nbest list we have lines as such:
>>
>> 0 ||| including options , чтобы buy 20 больше planes , соотношении volume
>> - в том 26 миллиардов долларов .  ||| LexicalReordering0= -3.12525 0 0
>> -7.34869 0 0 Distortion0= 0 LM0= -111.207 WordPenalty0= -18 PhrasePenalty0=
>> 17 TranslationModel0= -12.8271 -8.45991 -11.4888 -11.3076 ||| -746.163
>>
>> Let's say we are tuning the first parameter for LexicalReodering0 for
>> this sentence, is it that we only calculate:
>>
>> λ -3.12525 * -746.163
>>
>>
>> Is ai = 3.12525 for this sentence? Is bi = -746.163? What is bi suppose
>> to be?
>>
>
> From the above, b_i is a function of the remaining features and weights;
> so you need to know your current weight vector to compute it.
>
> -A
>
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] MERT's Powell Search

Reply via email to