Re: [Moses-support] Get the probability of a given n-gram in a language model

Kenneth Heafield Fri, 23 May 2014 08:38:28 -0700

Hi,

        You can use bin/query on an ARPA or KenLM file.  Then just type
sentences at it (or use a file as stdin).  By default it will assume you
are scoring sentences.  You can pass -n to not wrap in <s> and </s>.


        It appears that you are asking to score sentence fragments.  The
leading words will be scored using unigrams, bigrams, etc. from, say, a
5-gram model.  If you are using Kneser-Ney, these lower-order
probabilities (unigrams through 4-grams) are conditioned on having
backed off to them.  If you want accurate scores for sentence fragments,
build a model of order 1, order 2, order 3, etc. then combine them using

build_binary -r "1.arpa 2.arpa 3.arpa 4.arpa" 5.arpa 5.rest

You can then use

bin/fragment 5.rest <fragments

to attain log10 frequencies.  For more on this rant, read

http://kheafield.com/professional/edinburgh/rest_paper.pdf

Kenneth 

On 05/23/14 05:13, Albert Llorens wrote:
> Hi,
> 
>  
> 
> Is there a straightforward way I can ask Moses for the probability (or
> the frequency) of a given n-gram in a given language model? If so, can I
> do the query through mosesserver?
> 
>  
> 
> Thanks.
> 
>  
> 
> Kind regards.
> 
>  
> 
> Albert
> 
>  
> 
> 
> 
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
> 
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Get the probability of a given n-gram in a language model

Reply via email to