Hi,

According to the literature as long as the probability of each word  
pair alignment is set to the same non-zero value in the initialisation  
stage of IBM Model 1 then the algorithm will converge to the same  
local maximum.

In keeping with this principal the GIZA++ code initialises each  
alignment to the same uniform probability (1/number of different  
target words). However, it is clear that many source words in a corpus  
(let us take the case of words which only occur once in the entire  
corpus) only co-occur with a subset of the target words in the corpus.  
In fact, this is reflected in the GIZA++ code which creates the  
initial TTable data structure. It only adds co-occurences that  
actually exist in the bilingual corpus it is provided with and rightly  
so.

Would it not, therefore, make more sense to initialise each alignment  
probability to 1 / the number of target words that word actually  
co-occurs with. Obviously, such an approach would destroy the  
guarantee of EM converging to the same local maximum. But wouldn't  
that local maximum, at least in theory, be a better one to arrive at?  
Has anyone ever experimented with this kind of approach before?

James

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to