Hi, Slightly off-topic, but I am out of ideas. I am trying to figure out what set of parameters I have to use with IRSTLM to creates LMs that are equivalent to language models created with SRILM using the following command:
(SRILM:) ngram-count -order 5 -unk -interpolate -kndiscount -text input.en -lm lm.en.arpa Up to now, I am using this chain of commands for IRSTLM: perl -C -pe 'chomp; $_ = "<s> $_ </s>\n"' < input.en > input.en.sb ngt -i=input.en.sb -n=5 -b=yes -o=lm.en.bin tlm -tr=lm.en.bin -lm=sb -bo=yes -n=5 -o=lm.en.arpa I know this is not quite the same, but it comes closest in terms of quality and size. The translation results, however, are still consistently worse than with SRILM models, differences in BLEU are up to 1%. I use KenLM with Moses to binarize the resulting arpa files, so this is not a code issue. Also it seems IRSTLM has a bug with the modified shift beta option. At least KenLM complains that not all 4-grams are present although there are 5-grams that contain them. Any ideas? Thanks, Marcin _______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support