Hi. I have a question which is not directly about Moses but more generally about phrase extraction in phrase-based statistical machine translation. I hope it is not considered off-topic! I haven't been able to easily locate a satisfactory answer.
In state-of-the-art phrase-based machine translation, once the sentence pair has been aligned, all possible phrase pairs are extracted and it is assumed that all of them have exactly been seen once. Counts are collected for all sentence pairs in the training corpus and then used to compute a crude estimate of translation probability Phi(f|e) in Philipp Koehn's book 'Statistical Machine Translation', p. 136, eq. (5.4). I was thinking about the possibility that Philipp himself hints at after this equation, that is, considering each possible segmentation completely (perfectly) covering *both* the source sentence and the target sentence, counting how many such complete coverings there are for that sentence pair, considering all of them equally likely, and assigning the corresponding "fractional counts" to the phrase pairs used in each covering, and then using the fractional counts to obtain a better estimate of Phi(f|e) (which could be iteratively refined by using it to estimate the likelihood of each covering, in a sort of "poor man's" expectation maximization, more crude than the alignment-less "rich man's" EM phrase extraction by Marcu and Wong (2002) or the alignment-constrained EM phrase extraction by Birch, Callison-Burch and Koehn (2006)). The "fractional counts" idea looks like somehting that could be easily done but before I explore the idea further I would appreciate it very much if someone in this list could tell me if it has been done. Thanks a million! Mikel Mikel L. Forcada <[email protected]> Dept. Llenguatges i Sistemes Informàtics Universitat d\\\'Alacant, E-03071 Alacant (Spain) Tel.: +34 96 590 9776 Fax: +34 96 590 9326 _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
