Hi there, I have a question on the calculation of the lexical weighting model. For phrase pairs has several different alignments, how does Moses compute their lexical weighting score?
For example: in a (fr-en) corpus, there is a phrase pair: (le ||| it the). And I can find two alignments given by GIZA++: * 0-0 0-1 * 0-1 The strategy described in Philipp Koehn's book (2010) is to calculate the lexical weighting score for each possible alignment and to take the one with maximal score. For the first alignment (0-0 0-1), the lexical weighting score is: lex(f|e) = (w(le|it)+w(le|the))/2 = (0.0330916+0.1952182)/2=0.114155 For the second, (0-0), it is: lex(f|e) = w(le|the) = 0.1952182 So we should according to the book take the second alignment as the alignment between this phrase pair (le ||| it the). However, and here Moses took the first one (0-0 0-1). Does Moses consider different alignments between a phrase pair? If yes, then how does Moses choose the best alignment? If no, then which alignment Moses will take? (the first one, the most frequent one, or other strategy) Also, I'd be interested to hear any experience about the potential impact on each strategy. sincerely, -- Gong Li _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
