On Tue, Oct 26, 2010 at 9:43 AM, <sa...@kortec.nl> wrote: > Hi, > > I have a question about PBSMT estimation. If I understand it correctly > this is done in the following manner: > > - first IBM alignments in both directions > - then an aligment heuristic such as grow-diag final > - from this we create all possible phrase pairs, with some restrictions > (nothing going outward etc.)
This is by far the most common approach, for instance it is used in Moses. There are a variety of other approaches that have been tried. > 1) how are the phrase pairs counted for estimations. Will PBMST include > all segmentations that can be created under these restrictions and count > the phrase pairs in each segmentation (uniform over all segmentations) There is no global inference in this method; a phrase pair is counted once if it is observed to be consistent with the alignment, regardless of how many segmentations it occurs in. > 2) If this is the case, how does it deal with bottleneck sentences which > have a lot of null alignements or a lot of smallest possible phrases. For > instance in a long bitext sentence where the alignment will map the first > source word to the first target word, then a null, then the third source > to the third target word etc. > In this example the search over all possible segmentations is very large, > how does PBSMT deal with this? Since there is no global inference it isn't necessary to enumerate segmentations. There are at most a quadratic number of source phrases, and each one aligns to zero, one, or possibly a small number of target phrases (depending on the treatment of null alignments). You can easily enumerate them all and simply extract the corresponding target phrases. This is what Moses does. For the global inference view, see this paper, which influenced a great deal of subsequent research: http://aclweb.org/anthology-new/W/W02/W02-1018.pdf Cheers Adam _______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support