Hi, According to the literature as long as the probability of each word pair alignment is set to the same non-zero value in the initialisation stage of IBM Model 1 then the algorithm will converge to the same local maximum.
In keeping with this principal the GIZA++ code initialises each alignment to the same uniform probability (1/number of different target words). However, it is clear that many source words in a corpus (let us take the case of words which only occur once in the entire corpus) only co-occur with a subset of the target words in the corpus. In fact, this is reflected in the GIZA++ code which creates the initial TTable data structure. It only adds co-occurences that actually exist in the bilingual corpus it is provided with and rightly so. Would it not, therefore, make more sense to initialise each alignment probability to 1 / the number of target words that word actually co-occurs with. Obviously, such an approach would destroy the guarantee of EM converging to the same local maximum. But wouldn't that local maximum, at least in theory, be a better one to arrive at? Has anyone ever experimented with this kind of approach before? James -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
