Re: [Moses-support] Skip OOV when computing Language Model score

Jie Jiang Fri, 15 Jan 2016 05:43:49 -0800

Hi Ergun:

The original request in Quang's post was:


*For instance, with the n-gram: "the <unk> house <unk> in", I would like
the decoder to assign it the probability of the phrase: "the house in"
(existing in the LM).*

so each time there is a <unk> when calculating the LM score, you need to
look another word further.

I believe that it cannot be achieved on current LM tools without modifying
the source code, which has already been clarified by Kenneth.


2016-01-15 13:20 GMT+00:00 Ergun Bicici <[email protected]>:

>
> Dear Kenneth,
>
> In the Moses manual, -drop-unknown switch is mentioned:
>
> 4.7.2
>  Handling Unknown Words
> Unknown words are copied verbatim to the output. They are also scored by
> the language
> model, and may be placed out of order. Alternatively, you may want to drop
> unknown words.
> To do so add the switch -drop-unknown.
>
> Alternatively, you can write a script that replaces all OOV tokens with
> some OOV-token-identifier such as <unk> before sending for translation.
>
>
> *Best Regards,*
> Ergun
>
> Ergun Biçici
> DFKI Projektbüro Berlin
>
>
> On Fri, Jan 15, 2016 at 12:22 AM, Kenneth Heafield <[email protected]>
> wrote:
>
>> Hi,
>>
>>         I think oov-feature=1 just activates the OOV count feature while
>> leaving LM score unchanged.  So it would still include p(<unk> | in).
>>
>>         One might try setting the OOV feature weight to -weight_LM *
>> weird_moses_internal_constant * log p(<unk>) in an attempt to cancel out
>> the log p(<unk>) terms.  However that won't work either because:
>>
>> 1) It will still charge backoff penalties, b(the)b(house) in the example.
>>
>> 2) The context will be lost each time so it's p(house) not p(house | the).
>>
>> If the <unk>s follow a pattern, such as appearing every other word, one
>> could insert them into the ARPA file though that would waste memory.
>>
>> I don't think there's any way to accomplish exactly what OP asked for
>> without coding (though it wouldn't be that hard once one understands how
>> the LM infrastructure works).
>>
>> Kenneth
>>
>> On 01/14/2016 11:07 PM, Philipp Koehn wrote:
>> > Hi,
>> >
>> > You may get the behavior you want by adding
>> >   "oov-feature=1"
>> > to your LM specification line in moses.ini
>> > and also add a second weight with value "0" to the corresponding LM
>> > weight setting.
>> >
>> > This will then only use the scores
>> > p(the|<s>)
>> > p(house|<s>,the,<unk>) ---> backoff to p(house)
>> > p(in|<s>,the,<unk>,house,<unk>) ---> backoff to p(in)
>> >
>> > -phi
>> >
>> > On Thu, Jan 14, 2016 at 8:25 AM, LUONG NGOC Quang
>> > <[email protected] <mailto:[email protected]>> wrote:
>> >
>> >     Dear All,
>> >
>> >     I am currently using a SRILM Language Model (LM) in my Moses
>> >     decoder. Does anyone know how can I ask the decoder, at the decoding
>> >     time, skip all out-of-vocabulary words when computing the LM score
>> >     (instead of doing back-off)?
>> >
>> >     For instance, with the n-gram: "the <unk> house <unk> in", I would
>> >     like the decoder to assign it the probability of the phrase: "the
>> >     house in" (existing in the LM).
>> >
>> >     Do I need more options/declarations in moses.ini file?
>> >
>> >     Any help is very much appreciated,
>> >
>> >     Best,
>> >     Quang
>> >
>> >
>> >
>> >     _______________________________________________
>> >     Moses-support mailing list
>> >     [email protected] <mailto:[email protected]>
>> >     http://mailman.mit.edu/mailman/listinfo/moses-support
>> >
>> >
>> >
>> >
>> > _______________________________________________
>> > Moses-support mailing list
>> > [email protected]
>> > http://mailman.mit.edu/mailman/listinfo/moses-support
>> >
>> _______________________________________________
>> Moses-support mailing list
>> [email protected]
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>


-- 

Best regards!

Jie Jiang

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Skip OOV when computing Language Model score

Reply via email to