Hello,

I am looking at the debug output of ./ngram-count -order 2 -read toy.ngrams
-lm toy.binary -interpolate -ukndiscount -gt1min 0 -gt2min 0
-write-binary-lm -debug 2

and I am seeing that for my training dataset, the n1,n2 quantities for
1-gram is not consistent with the actual n1,n2 I should get from the data.
The debug output confirms this guess by showing "modifying 1-gram counts
for Kneser-Ney smoothing". However, it matches the actual n1,n2 for the
2-grams case. A sample output to demonstrate my case:

using KneserNey for* 1-grams*
modifying 1-gram counts for Kneser-Ney smoothing ***
Kneser-Ney smoothing 1-grams
n1 = 704 ***
n2 = 189 ***
D = 0.650647 **//the discount also changes since it is based on n1, n2*
using KneserNey for *2-grams*
Kneser-Ney smoothing 2-gram
n1 = 2884 ***
n2 = 274 ***
D = 0.840326 ***

So, here are my questions:
1) Is there a way to avoid this refinement and force srilm to use the
actual n1,n2 ?
2) what are the criteria for refining n1,n2, and how it is done? (Any
documentation??)

Thanks,
Koorm
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to