Hello, I am looking at the debug output of ./ngram-count -order 2 -read toy.ngrams -lm toy.binary -interpolate -ukndiscount -gt1min 0 -gt2min 0 -write-binary-lm -debug 2
and I am seeing that for my training dataset, the n1,n2 quantities for 1-gram is not consistent with the actual n1,n2 I should get from the data. The debug output confirms this guess by showing "modifying 1-gram counts for Kneser-Ney smoothing". However, it matches the actual n1,n2 for the 2-grams case. A sample output to demonstrate my case: using KneserNey for* 1-grams* modifying 1-gram counts for Kneser-Ney smoothing *** Kneser-Ney smoothing 1-grams n1 = 704 *** n2 = 189 *** D = 0.650647 **//the discount also changes since it is based on n1, n2* using KneserNey for *2-grams* Kneser-Ney smoothing 2-gram n1 = 2884 *** n2 = 274 *** D = 0.840326 *** So, here are my questions: 1) Is there a way to avoid this refinement and force srilm to use the actual n1,n2 ? 2) what are the criteria for refining n1,n2, and how it is done? (Any documentation??) Thanks, Koorm
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
