As far as I know, exact modified Kneser-Ney smoothing (the current state of the art) is not supported by IRSTLM. IRSTLM instead implements modified shift-beta smoothing, which isn't quite as effective -- especially on smaller data sets.
Cheers, Jon On Tue, Nov 6, 2012 at 1:08 PM, Marcin Junczys-Dowmunt <junc...@amu.edu.pl> wrote: > Hi, > Slightly off-topic, but I am out of ideas. I am trying to figure out > what set of parameters I have to use with IRSTLM to creates LMs that are > equivalent to language models created with SRILM using the following > command: > > (SRILM:) ngram-count -order 5 -unk -interpolate -kndiscount -text > input.en -lm lm.en.arpa > > Up to now, I am using this chain of commands for IRSTLM: > > perl -C -pe 'chomp; $_ = "<s> $_ </s>\n"' < input.en > input.en.sb > ngt -i=input.en.sb -n=5 -b=yes -o=lm.en.bin > tlm -tr=lm.en.bin -lm=sb -bo=yes -n=5 -o=lm.en.arpa > > I know this is not quite the same, but it comes closest in terms of > quality and size. The translation results, however, are still > consistently worse than with SRILM models, differences in BLEU are up to > 1%. > > I use KenLM with Moses to binarize the resulting arpa files, so this is > not a code issue. > > Also it seems IRSTLM has a bug with the modified shift beta option. At > least KenLM complains that not all 4-grams are present although there > are 5-grams that contain them. > > Any ideas? > Thanks, > Marcin > _______________________________________________ > Moses-support mailing list > Moses-support@mit.edu > http://mailman.mit.edu/mailman/listinfo/moses-support _______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support