it has been a while since i looked at this, but look at this (good-turning):

*not pruning*

[rydell]miles: ./ngram-count -lm /tmp/test2.lm -order 3 -gt1min 0  -gt2min 0
-gt3min 0  -text ../../../mt/diskbased-l m-training/temp.txt
warning: discount coeff 1 is out of range: 5.55654e-17
warning: discount coeff 6 is out of range: 1.17267
warning: discount coeff 7 is out of range: 1.14801
warning: count of count 8 is zero -- lowering maxcount
warning: count of count 7 is zero -- lowering maxcount
warning: count of count 6 is zero -- lowering maxcount

[rydell]miles: head /tmp/test2.lm

\data\
ngram 1=161
ngram 2=306
ngram 3=328

\1-grams:
-1.980823       !=      -0.1440484

*pruning*

[rydell]miles: ./ngram-count -lm /tmp/test3.lm -order 3  -text
../../../mt/diskbased-lm-training/temp.txt             warning: discount
coeff 1 is out of range: 5.55654e-17
warning: discount coeff 6 is out of range: 1.17267
warning: discount coeff 7 is out of range: 1.14801
warning: count of count 8 is zero -- lowering maxcount
warning: count of count 7 is zero -- lowering maxcount
warning: count of count 6 is zero -- lowering maxcount
[rydell]miles: head /tmp/test3.lm

\data\
ngram 1=161
ngram 2=306
ngram 3=44

MIles

2008/8/5 Miles Osborne <[EMAIL PROTECTED]>

> you want to also check that ngrams are not getting pruned by probability
> (in addition to counts)
>
> this whole business is a bit on the murky side and the only reason i know
> about it was when i was writing a disk-based version of ngram-count a year
> or so back
>
> Miles
>
> 2008/8/5 John D. Burger <[EMAIL PROTECTED]>
>
>> Miles Osborne wrote:
>>
>>
>> > by default the srilm prunes singletons
>>
>> OK, that's good to know.  But when I prune the IRST LM, I still get
>> lots =more= 4-grams than the SRI LM, but lots =fewer= 5-grams
>> (although less than a factor of two in either case).
>>
>> But perhaps I'm a bit in the weeds here ... :)
>>
>> - John Burger
>>   MITRE
>>
>> _______________________________________________
>> Moses-support mailing list
>> [email protected]
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>
>
> --
> The University of Edinburgh is a charitable body, registered in Scotland,
> with registration number SC005336.
>



-- 
The University of Edinburgh is a charitable body, registered in Scotland,
with registration number SC005336.
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to