Hi,
Slightly off-topic, but I am out of ideas. I am trying to figure out 
what set of parameters I have to use with IRSTLM to creates LMs that are 
equivalent to language models created with SRILM using the following 
command:

(SRILM:) ngram-count -order 5 -unk -interpolate -kndiscount -text 
input.en -lm lm.en.arpa

Up to now, I am using this chain of commands for IRSTLM:

perl -C -pe 'chomp; $_ = "<s> $_ </s>\n"' < input.en > input.en.sb
ngt -i=input.en.sb -n=5 -b=yes -o=lm.en.bin
tlm -tr=lm.en.bin -lm=sb -bo=yes -n=5 -o=lm.en.arpa

I know this is not quite the same, but it comes closest in terms of 
quality and size. The translation results, however, are still 
consistently worse than with SRILM models, differences in BLEU are up to 
1%.

I use KenLM with Moses to binarize the resulting arpa files, so this is 
not a code issue.

Also it seems IRSTLM has a bug with the modified shift beta option. At 
least KenLM complains that not all 4-grams are present although there 
are 5-grams that contain them.

Any ideas?
Thanks,
Marcin
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to