Hi everybody.
I am trying to understand what the purpose of using the EM algorithm for
training IBM Model 1 (MTS system described in "The mathematics of
Statistical Machine Translation: Parameter Estimation" by Brown et al).
To me it seems that the result will be overfitted.
By collecting the number of occurences of wordpairs (ie. (e,f)) from the
two sentences and normalizing just once (is running just one step of the
EM algorithm), why is it that the result can't be used? Running the EM
algorithm subsequently means that wordpairs with a higher number of
occurences (and thus also higher resulting t-values) will slowly
increase their t-score, taking the probability mass from lower scores.
But why is this a good thing to do, and doesn't this mean that the
values eventually will converge to 1 for the single pair with the
highest number of occurences (for each word in the source text, that is)?
Any help will be appreciated.
Regards,
Michael.
--
Which is more dangerous? TV guided missiles or TV guided families?
Visit my home page at http://michael.zedeler.dk/
Get my vcard at http://michael.zedeler.dk/vcard.vcf
_______________________________________________
Mt-list mailing list