Hi all,

IRSTLM uses simpler smoothing methods than SRILM.
In particular, improved kneser ney smoothing with
SRILM uses corrected frequencies for lower order
n-grams, while IRSTLM does not. (This indeed results
in less n-grams.)  The reason is that
introducing corrected frequencies makes it hard
to distribute LM training over many machines, at least
with our implementation. The idea of the free IRSTLM
tool is to permit the estimation of huge LMs by giving
higher priority to efficiency rather than to precision.
More precise but smaller LMs can be estimated
nevertheless with SRILM and  efficienctly handled
at run time with IRSTLM.


Greetings,
Marcello Federico






________________________________________
From: [EMAIL PROTECTED] [EMAIL PROTECTED] On Behalf Of John D. Burger [EMAIL 
PROTECTED]
Sent: Tuesday, August 05, 2008 10:02 PM
To: [email protected]
Subject: Re: [Moses-support] Trying to debug reduced performance with new       
Moses

Miles Osborne wrote:

> you want to also check that ngrams are not getting pruned by
> probability (in addition to counts)

Yes, in fact, this:

   http://www.speech.sri.com/projects/srilm/manpages/ngram-count.1.html

seems to suggest that pruning is done based on not changing
perplexity very much, rather than raw count or even probability.

> this whole business is a bit on the murky side and the only reason
> i know about it was when i was writing a disk-based version of
> ngram-count a year or so back

I'm starting to think it's a lost cause to try to get one LM
implementation to act very much like the other.  Thanks for the
insights, though!

- John Burger
   MITRE
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to