Hi,

I have a question about PBSMT estimation. If I understand it correctly
this is done in the following manner:

- first IBM alignments in both directions
- then an aligment heuristic such as grow-diag final
- from this we create all possible phrase pairs, with some restrictions
(nothing going outward etc.)

My questions are:

1) how are the phrase pairs counted for estimations. Will PBMST include
all segmentations that can be created under these restrictions and count
the phrase pairs in each segmentation (uniform over all segmentations)

2) If this is the case, how does it deal with bottleneck sentences which
have a lot of null alignements or a lot of smallest possible phrases. For
instance in a long bitext sentence where the alignment will map the first
source word to the first target word, then a null, then the third source
to the third target word etc.
In this example the search over all possible segmentations is very large,
how does PBSMT deal with this?

Thanks for the reply.
Sanne

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to