Dear Adam and Moses devs/users, @Adam, Thank you for the explanation on the line 6 of the pseudo code. I understand it better now.
I have a few more short questions about the pseudo code for the powell search on slide 37 of http://mt-class.org/jhu/slides/lecture-tuning.pdf, On line 6 does the "score" in "compute line l: parameter value -> score" refer to (i) the MT evaluation metric score (e.g. BLEU) between the translation and the reference sentence or (ii) nbest list weighted overall score as we see in the last column of a moses generated nbest list (e.g. http://www.statmt.org/moses/?n=Advanced.Search)? If it is BLEU, is it true that these sentence-level scores of the nbest list can be pre-calculated before getting into the powell search? At line 8 of the pseudo code, when it asks to "find line l with steepest descent", is it looking for each sentence find the (i) line with the highest λj or (i) the line with the highest g(ei|f). If line 8 of the pseudo code is to find the line with the highest λk, when "computing the line l: parameter value -> score", we should also do something like this ( [g(ei|f) - ∑k∈ 1,...,j-1,j+1,...,|λ| λkhk(ei,f) ] / hj(ei,f) ) to find the line with the highest λj right? Then at line 15 of the pseudo code, it says "compute score for value before first threshold point". Is this "score" different from the "score" at line 6? At line 6, it's a sentence-level score (which I hope it means BLEU and not the weighted overall score), and at line 15, it seems to be computing the corpus-level score given the initial parameter values. If at line 15, it is computing the corpus level score, is it only taking the best score of the n translations for each reference? And if this is BLEU, it's doing not a simple case of averaging sentence-level BLEU which might be kept from line 6, is that right? If it is BLEU, then this score could be pre-computed before the powell search too, right? I'm sorry for the many questions and request for clarification. Thanks in advance for the tips and answers! Regards, Liling On Thu, Dec 10, 2015 at 10:29 AM, Adam Lopez <alo...@inf.ed.ac.uk> wrote: > Hi Liling – > > >> We are going through the slides for MT tuning on >> http://mt-class.org/jhu/slides/lecture-tuning.pdf and we couldn't figure >> out what does "λai + bi" on slide 31 refer to. >> >> What are the values for "ai" and "bi"? Are they numbers from the nbest >> list? >> > > [For clarity, I'm going to change the notation slightly here: slide 23 > uses λ to refer to the parameter vector (indexed by i), while slide 31 uses > it to refer to a single parameter (i.e. an element of this vector). This is > confusing. Let's use λ as the parameter vector, |λ| as its length, and λj as > its j-th element, which is what we're optimizing in slides 31-36 (since i > is already used on slide 31 to index elements of the n-best list, λi would > be confusing). I'm also going to use g(ei|f) rather than p(ei|f) since > this is just a linear model; we aren't doing probabilistic inference here.] > > We're going to compute g(ei|f) as a function of a single parameter λj > while holding all other parameters fixed. This is just: > > g(ei|f) = ∑k∈ 1,...,|λ| λkhk(ei,f) = λjhj(ei,f) + ∑k∈ > 1,...,j-1,j+1,...,|λ| λkhk(ei,f) > > Hence ai = hj(ei,f) and bi = ∑k∈ 1,...,j-1,j+1,...,|λ| λkhk(ei,f). In > other words, ai is just the value of the j-th feature on the i-th element > of the n-best list, and bi is the model score according to all other > features and weights. > > According to the algorithm on slide 37 of >> http://mt-class.org/jhu/slides/lecture-tuning.pdf, is line 6 where >> the λai + bi computation occurs? >> >> compute line l: parameter value → score >> >> > Yes. > > From the nbest list we have lines as such: >> >> 0 ||| including options , чтобы buy 20 больше planes , соотношении volume >> - в том 26 миллиардов долларов . ||| LexicalReordering0= -3.12525 0 0 >> -7.34869 0 0 Distortion0= 0 LM0= -111.207 WordPenalty0= -18 PhrasePenalty0= >> 17 TranslationModel0= -12.8271 -8.45991 -11.4888 -11.3076 ||| -746.163 >> >> Let's say we are tuning the first parameter for LexicalReodering0 for >> this sentence, is it that we only calculate: >> >> λ -3.12525 * -746.163 >> >> >> Is ai = 3.12525 for this sentence? Is bi = -746.163? What is bi suppose >> to be? >> > > From the above, b_i is a function of the remaining features and weights; > so you need to know your current weight vector to compute it. > > -A > > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > >
_______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support