Hi, I have a question about PBSMT estimation. If I understand it correctly this is done in the following manner:
- first IBM alignments in both directions - then an aligment heuristic such as grow-diag final - from this we create all possible phrase pairs, with some restrictions (nothing going outward etc.) My questions are: 1) how are the phrase pairs counted for estimations. Will PBMST include all segmentations that can be created under these restrictions and count the phrase pairs in each segmentation (uniform over all segmentations) 2) If this is the case, how does it deal with bottleneck sentences which have a lot of null alignements or a lot of smallest possible phrases. For instance in a long bitext sentence where the alignment will map the first source word to the first target word, then a null, then the third source to the third target word etc. In this example the search over all possible segmentations is very large, how does PBSMT deal with this? Thanks for the reply. Sanne _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
