Re: [Moses-support] producing the minimal number of LM-OOVs

Kenneth Heafield Sat, 19 Mar 2011 12:19:48 -0700

I believe -vocab takes a file containing the vocabulary and maps
everything else in your training data to OOV, including producing
n-grams that contain <unk>.  Placing <unk> in the training data will
cause it to be treated like any other word in the corpus, which seems to
be what you want.


With the -100 penalty all you're doing is forcing the OOV feature weight
to be -100 * the LM weight.  I suspect MERT can do a better job of
determining the ratio of these weights for your particular data, but
MERT is known to make mistakes.

Pass-through and language model OOV are close, but separate, issues.  A
passed-through phrase table OOV is often still found in the language
model.

Kenneth

On 03/19/11 15:01, Alexander Fraser wrote:
> Cool, thanks for the explanation and fix.
> 
> What does -vocab do? Is it a trick to replace things that are not in
> the vocab with <unk>? Does explicitly putting <unk> in the training
> data not work? I thought I could do that, the SRILM FAQ seems to
> indicate that this will work, haven't tried it yet.
> 
> How exactly are you folks training your open vocab LMs, are you
> replacing something (singleton LM vocab?) with <unk>, or just adding a
> single line to the training data with <unk> in it? I think SRILM
> prunes singletons by default, does that affect <unk> at all?
> 
> I agree in general about OOVs, but I still think it is questionable
> whether the addition of a single penalty is enough to let the baseline
> Moses model intelligently trade-off between LM-OOV and LM-Known
> (assuming that the parallel corpus is in the LM, which I
> experimentally verified is a good idea many years ago, and I think the
> result probably still holds). But perhaps Chris already has the
> results to prove me wrong. Anyway, I agee that adding this feature
> function is the right solution.
> 
> BTW, if you think the Moses model with the addition of the penalty can
> do this trade-off correctly, then you should allow pass-through for
> *all* words, not just words that can wind up uncovered, you would then
> get a further improvement.
> 
> Cheers, Alex
> 
> 
> On Sat, Mar 19, 2011 at 7:18 PM, Kenneth Heafield <[email protected]> wrote:
>> With a closed vocabulary LM, SRILM returns -inf on OOV and moses floors
>> this to LOWEST_SCORE which is -100.0.  If you want identical behavior
>> from KenLM,
>>
>> kenlm/build_binary -u -100.0 foo.arpa foo.binary
>>
>> Unless you passed -vocab to SRILM (and most people don't), <unk> never
>> appears except as a unigram.  Therefore, Chris is not getting any gain
>> from additional conditioning.
>>
>> OOVs can be good: names of people who appear in the news, new product
>> names etc.
>>
>> On 03/19/11 14:02, Alexander Fraser wrote:
>>> Hi Folks --
>>>
>>> An LM-OOV feature sounds like a good solution to me. Chris, have you
>>> tried pegging the LM-OOV feature weight at an extremely high value? I
>>> suspect the gains you are getting are due to the use of <unk> in LM
>>> conditioning, i.e., p(word|... <unk> ...), rather than due to allowing
>>> more LM-OOVs.
>>>
>>> If the LM-OOV feature were defaulted to an extremely high value, we
>>> would get the behavior that Moses+SRILM has, but people who wanted to
>>> could try training the weight.
>>>
>>> I think using an open-class LM without such a penalty is not a good
>>> idea. I guess maybe the Moses+SRILM code defaults to a log probability
>>> value of something like -20 for p(LM-OOV|any-context) regardless of
>>> whether <unk> is present in the LM, so that is why it is OK to use an
>>> open-class LM with SRILM.
>>>
>>> Cheers, Alex
>>>
>>>
>>> On Sat, Mar 19, 2011 at 6:03 PM, Chris Dyer <[email protected]> wrote:
>>>> I've started using an OOV feature (fires for each LM-OOV) together
>>>> with an open-vocabulary LM, and found that this improves the BLEU
>>>> score. Typically, the weight learned on the OOV feature (by MERT) is
>>>> quite a bit more negative than the default amount estimated during LM
>>>> training, but it is still far greater than the "avoid at all costs"
>>>> moses/joshua OOV default behavior. As a result, there is a small
>>>> increase in the number of OOVs in the output (I have not counted this
>>>> number). However, the I find that the bleu score increases a bit for
>>>> doing this (magnitude depends on a number of factors), and the "extra"
>>>> OOVs typically occur in places where the possible English translation
>>>> would have been completely nonsensical.
>>>> -Chris
>>>>
>>>> On Sat, Mar 19, 2011 at 12:51 PM, Alexander Fraser
>>>> <[email protected]> wrote:
>>>>> Hi Folks,
>>>>>
>>>>> Is there some way to penalize LM-OOVs when using Moses+KenLM? I saw a
>>>>> suggestion to create an open-vocab LM (I usually use closed-vocab) but
>>>>> I think this means that in some context a LM-OOV could be produced in
>>>>> preference to a non LM-OOV. This should not be the case in standard
>>>>> phrase-based SMT (e.g., using the feature functions used in the Moses
>>>>> baseline for the shared task for instance). Instead, Moses should
>>>>> produce the minimal number of LM-OOVs possible.
>>>>>
>>>>> There are exceptions to this when using different feature functions.
>>>>> For instance, we have a paper on trading off transliteration vs
>>>>> semantic translation (for Hindi to Urdu translation), where the
>>>>> transliterations are sometimes LM-OOV, but still a better choice than
>>>>> available semantic translations (which are not LM-OOV). But the
>>>>> overall SMT models we used supports this specific trade-off (and it
>>>>> took work to make the models do this correctly, this is described in
>>>>> the paper).
>>>>>
>>>>> I believe for the other three LM packages used with Moses the minimal
>>>>> number of LM-OOVs is always produced. I've switched back to
>>>>> Moses+SRILM for now due to this issue. I think it may be the case that
>>>>> Moses+KenLM actually produces the maximal number of OOVs allowed by
>>>>> the phrases loaded, which would be highly undesirable. Empirically, it
>>>>> certainly produces more than Moses+SRILM in my experiments.
>>>>>
>>>>> Thanks and Cheers, Alex
>>>>> _______________________________________________
>>>>> Moses-support mailing list
>>>>> [email protected]
>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>
>>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> [email protected]
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>> _______________________________________________
>> Moses-support mailing list
>> [email protected]
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
> 
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] producing the minimal number of LM-OOVs

Reply via email to