Re: [Moses-support] producing the minimal number of LM-OOVs

Alexander Fraser Sat, 19 Mar 2011 11:17:32 -0700

Hi Kenneth,

You mean log probability value = 0, I think? Probability = 0 in
Moses-land is often redefined to mean a floored log probability value
of -20 or -40 or whatever; this is what I thought the error message
was referring to. Yes, that is what is causing the problem.


Cheers, Alex


On Sat, Mar 19, 2011 at 6:25 PM, Kenneth Heafield <[email protected]> wrote:
> The original behavior was to refuse to load any model without <unk>.
> Early on, Hieu asked me to change that.  The default is now to
> substitute probability 0.0 and print this complaint stderr:
>
> The ARPA file is missing <unk>.  Substituting probability 0.0.
>
> SRI's ngram scoring tool skips OOVs, so a probability of 0.0 reproduces
> that behavior (though I still charge the backoff penalty from preceding
> words).  I'm still not happy with it.
>
> Documentation like http://statmt.org/wmt11/baseline.html carries
> influence.  Can you add -unk?
>
> On 03/19/11 13:07, Philipp Koehn wrote:
>> Hi,
>>
>> I have recently build all my language models with the "-unk" flag,
>> so it creates probability mass for unseen words (there is a line
>> for <unk> in the language model file).
>>
>> But I am actually not sure if the SRILM interface properly uses
>> this probability. It may just fall back to a very low floor.
>> So it may be that Alex's desired feature is just a bug, which can
>> be reproduced with kenlm by not training with "-unk", hence
>> also falling back to the floor probability (if that is what kenlm
>> is doing).
>>
>> -phi
>>
>> On Sat, Mar 19, 2011 at 4:59 PM, Kenneth Heafield <[email protected]> 
>> wrote:
>>> I believe the right answer to this is adding an OOV count feature to
>>> Moses.  In fact, I've gone through and made all the language models
>>> return a struct indicating if the word just scored was OOV.  However,
>>> this needs to make in into the phrases and ultimately the features.
>>> Also, there's the fun of adding a config option to moses.ini.  Thoughts
>>> on default behavior?
>>>
>>> You can control the unknown word probability by passing -u probability
>>> to build_binary.  Set that to something negative.  It will only be
>>> effective if the ARPA file was trained without <unk>.
>>>
>>> Also, is there are evidence out there for or against passing -unk to
>>> SRILM?
>>>
>>> Kenneth
>>>
>>> On 03/19/11 12:51, Alexander Fraser wrote:
>>>> Hi Folks,
>>>>
>>>> Is there some way to penalize LM-OOVs when using Moses+KenLM? I saw a
>>>> suggestion to create an open-vocab LM (I usually use closed-vocab) but
>>>> I think this means that in some context a LM-OOV could be produced in
>>>> preference to a non LM-OOV. This should not be the case in standard
>>>> phrase-based SMT (e.g., using the feature functions used in the Moses
>>>> baseline for the shared task for instance). Instead, Moses should
>>>> produce the minimal number of LM-OOVs possible.
>>>>
>>>> There are exceptions to this when using different feature functions.
>>>> For instance, we have a paper on trading off transliteration vs
>>>> semantic translation (for Hindi to Urdu translation), where the
>>>> transliterations are sometimes LM-OOV, but still a better choice than
>>>> available semantic translations (which are not LM-OOV). But the
>>>> overall SMT models we used supports this specific trade-off (and it
>>>> took work to make the models do this correctly, this is described in
>>>> the paper).
>>>>
>>>> I believe for the other three LM packages used with Moses the minimal
>>>> number of LM-OOVs is always produced. I've switched back to
>>>> Moses+SRILM for now due to this issue. I think it may be the case that
>>>> Moses+KenLM actually produces the maximal number of OOVs allowed by
>>>> the phrases loaded, which would be highly undesirable. Empirically, it
>>>> certainly produces more than Moses+SRILM in my experiments.
>>>>
>>>> Thanks and Cheers, Alex
>>>> _______________________________________________
>>>> Moses-support mailing list
>>>> [email protected]
>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>> _______________________________________________
>>> Moses-support mailing list
>>> [email protected]
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] producing the minimal number of LM-OOVs

Reply via email to