Hi everybody.

I am trying to understand what the purpose of using the EM algorithm for training IBM Model 1 (MTS system described in "The mathematics of Statistical Machine Translation: Parameter Estimation" by Brown et al). To me it seems that the result will be overfitted.

By collecting the number of occurences of wordpairs (ie. (e,f)) from the two sentences and normalizing just once (is running just one step of the EM algorithm), why is it that the result can't be used? Running the EM algorithm subsequently means that wordpairs with a higher number of occurences (and thus also higher resulting t-values) will slowly increase their t-score, taking the probability mass from lower scores. But why is this a good thing to do, and doesn't this mean that the values eventually will converge to 1 for the single pair with the highest number of occurences (for each word in the source text, that is)?

Any help will be appreciated.

Regards,

Michael.

--
Which is more dangerous? TV guided missiles or TV guided families?
Visit my home page at http://michael.zedeler.dk/
Get my vcard at http://michael.zedeler.dk/vcard.vcf

_______________________________________________
Mt-list mailing list

Reply via email to