SRILM prunes singletons for trigrams and above by default. You're likely to get better answers to SRILM-specific questions on srilm-user.
On 02/22/2015 06:28 AM, koormoosh wrote: > Hi, > > I wonder if SRI does any sort of implicit pruning or refinement? To be more > precise, is there any way to force SRI not to prune anything (removing > singletons, etc). I thought that my way of calling it does what I want (not > pruning), but then I don't know how to explain getting different results. > This is how I call SRI: > > ----------------------------------------------------------------------------------------------------- > ./ngram-count -order 3 -text training.txt -write training.ngrams > > ./ngram-count -order 3 -read training.ngrams -lm training.binary > -interpolate -ukndiscount -gt1min 0 -gt2min 0 -gt3min 0 -write-binary-lm > > ./ngram -order 3 -lm training.binary -ppl test.txt -debug 2 > > am I missing/misusing something? > > -------------------------------------------------------------------------------------------------------- > An example to show this problem: > (Example-1): > Test: "13 13 13" > Training: "13 13 13 13 17" > perplexity *matches* SRI: "2.79327" > > (Example-2): > Test: "13 13 13" > Training "13 13 13 13 13 13 17 17 17 17 17 14 14 15 15 15 16 16 16 16" > perplexity *doesn't match* SRI: "4.51546" and what SRI returns us "4.242". > ------------------------------------------------------------------------------------------------------- > > Thanks in advance, > Koorm > > > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
