As far as I know, exact modified Kneser-Ney smoothing (the current
state of the art) is not supported by IRSTLM. IRSTLM instead
implements modified shift-beta smoothing, which isn't quite as
effective -- especially on smaller data sets.

Cheers,
Jon


On Tue, Nov 6, 2012 at 1:08 PM, Marcin Junczys-Dowmunt
<junc...@amu.edu.pl> wrote:
> Hi,
> Slightly off-topic, but I am out of ideas. I am trying to figure out
> what set of parameters I have to use with IRSTLM to creates LMs that are
> equivalent to language models created with SRILM using the following
> command:
>
> (SRILM:) ngram-count -order 5 -unk -interpolate -kndiscount -text
> input.en -lm lm.en.arpa
>
> Up to now, I am using this chain of commands for IRSTLM:
>
> perl -C -pe 'chomp; $_ = "<s> $_ </s>\n"' < input.en > input.en.sb
> ngt -i=input.en.sb -n=5 -b=yes -o=lm.en.bin
> tlm -tr=lm.en.bin -lm=sb -bo=yes -n=5 -o=lm.en.arpa
>
> I know this is not quite the same, but it comes closest in terms of
> quality and size. The translation results, however, are still
> consistently worse than with SRILM models, differences in BLEU are up to
> 1%.
>
> I use KenLM with Moses to binarize the resulting arpa files, so this is
> not a code issue.
>
> Also it seems IRSTLM has a bug with the modified shift beta option. At
> least KenLM complains that not all 4-grams are present although there
> are 5-grams that contain them.
>
> Any ideas?
> Thanks,
> Marcin
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to