Hi all, IRSTLM uses simpler smoothing methods than SRILM. In particular, improved kneser ney smoothing with SRILM uses corrected frequencies for lower order n-grams, while IRSTLM does not. (This indeed results in less n-grams.) The reason is that introducing corrected frequencies makes it hard to distribute LM training over many machines, at least with our implementation. The idea of the free IRSTLM tool is to permit the estimation of huge LMs by giving higher priority to efficiency rather than to precision. More precise but smaller LMs can be estimated nevertheless with SRILM and efficienctly handled at run time with IRSTLM.
Greetings, Marcello Federico ________________________________________ From: [EMAIL PROTECTED] [EMAIL PROTECTED] On Behalf Of John D. Burger [EMAIL PROTECTED] Sent: Tuesday, August 05, 2008 10:02 PM To: [email protected] Subject: Re: [Moses-support] Trying to debug reduced performance with new Moses Miles Osborne wrote: > you want to also check that ngrams are not getting pruned by > probability (in addition to counts) Yes, in fact, this: http://www.speech.sri.com/projects/srilm/manpages/ngram-count.1.html seems to suggest that pruning is done based on not changing perplexity very much, rather than raw count or even probability. > this whole business is a bit on the murky side and the only reason > i know about it was when i was writing a disk-based version of > ngram-count a year or so back I'm starting to think it's a lost cause to try to get one LM implementation to act very much like the other. Thanks for the insights, though! - John Burger MITRE _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
