With Moses and a single phrase table, this can happen if there is a
single word that is not covered by the phrase table as a singleton,
but it can be covered with a longer phrase. In a typical shared task
dev or test set with grow-diag-final-and GIZA alignments, this is
restricted to about 5 to 10 words. I think it is possible that for
these 5 to 10 words pass-through directly competes with translation
(in Moses); but I haven't carefully checked this. I instead noticed
that KenLM liked to output things that were missing from my LM (this
was not competition of pass-through and trandlation), so this is
similar to the first scenario Chris outlined.

Chris -- with respect to the second scenario - it wasn't clear to me
if you have tried allowing pass-through for a larger set of words than
these 5 to 10 words? How do you build your open-class LM? I assume
this matters a lot.

Cheers, Alex


On Sun, Mar 20, 2011 at 6:58 PM, Chris Dyer <[email protected]> wrote:
> There are two sources:
>
> 1) if you have multiple LMs, and one does not include the target side
> of the bitext, you'll have a different profile of OOVs that are
> actually in the language. Relatedly, I decided to exclude the Europarl
> text from the LM training data, since I knew we would be translating
> newsy genres.
>
> 2) there seems to be some evidence that some translations in the
> phrase table are so bad that having leaving some words untranslated
> is "better" than using what's in the phrase table. I can see an
> argument that says that you should use the phrase table entries no
> matter what, but my limited experiments suggest that letting the LM
> make this call at least improves the BLEU score. Interpret that as you
> will.
>
> -C
>
> On Sun, Mar 20, 2011 at 1:28 PM, Philipp Koehn <[email protected]> wrote:
>> Hi,
>>
>> can I ask a dumb question -
>> where do these unknown words come from?
>>
>> Obviously there are words that are unknown in the source,
>> hence placed verbatim in the output, which will be likely
>> be unknown to the language model. But there is really not
>> much choice about having them or not (besides -drop-unknown).
>> All translations will have them.
>>
>> Otherwise, all words in the translation model should be known.
>>
>> So, what is the choice here?
>>
>> -phi
>>
>> On Sat, Mar 19, 2011 at 7:19 PM, Kenneth Heafield <[email protected]> 
>> wrote:
>>> I believe -vocab takes a file containing the vocabulary and maps
>>> everything else in your training data to OOV, including producing
>>> n-grams that contain <unk>.  Placing <unk> in the training data will
>>> cause it to be treated like any other word in the corpus, which seems to
>>> be what you want.
>>>
>>> With the -100 penalty all you're doing is forcing the OOV feature weight
>>> to be -100 * the LM weight.  I suspect MERT can do a better job of
>>> determining the ratio of these weights for your particular data, but
>>> MERT is known to make mistakes.
>>>
>>> Pass-through and language model OOV are close, but separate, issues.  A
>>> passed-through phrase table OOV is often still found in the language
>>> model.
>>>
>>> Kenneth
>>>
>>> On 03/19/11 15:01, Alexander Fraser wrote:
>>>> Cool, thanks for the explanation and fix.
>>>>
>>>> What does -vocab do? Is it a trick to replace things that are not in
>>>> the vocab with <unk>? Does explicitly putting <unk> in the training
>>>> data not work? I thought I could do that, the SRILM FAQ seems to
>>>> indicate that this will work, haven't tried it yet.
>>>>
>>>> How exactly are you folks training your open vocab LMs, are you
>>>> replacing something (singleton LM vocab?) with <unk>, or just adding a
>>>> single line to the training data with <unk> in it? I think SRILM
>>>> prunes singletons by default, does that affect <unk> at all?
>>>>
>>>> I agree in general about OOVs, but I still think it is questionable
>>>> whether the addition of a single penalty is enough to let the baseline
>>>> Moses model intelligently trade-off between LM-OOV and LM-Known
>>>> (assuming that the parallel corpus is in the LM, which I
>>>> experimentally verified is a good idea many years ago, and I think the
>>>> result probably still holds). But perhaps Chris already has the
>>>> results to prove me wrong. Anyway, I agee that adding this feature
>>>> function is the right solution.
>>>>
>>>> BTW, if you think the Moses model with the addition of the penalty can
>>>> do this trade-off correctly, then you should allow pass-through for
>>>> *all* words, not just words that can wind up uncovered, you would then
>>>> get a further improvement.
>>>>
>>>> Cheers, Alex
>>>>
>>>>
>>>> On Sat, Mar 19, 2011 at 7:18 PM, Kenneth Heafield <[email protected]> 
>>>> wrote:
>>>>> With a closed vocabulary LM, SRILM returns -inf on OOV and moses floors
>>>>> this to LOWEST_SCORE which is -100.0.  If you want identical behavior
>>>>> from KenLM,
>>>>>
>>>>> kenlm/build_binary -u -100.0 foo.arpa foo.binary
>>>>>
>>>>> Unless you passed -vocab to SRILM (and most people don't), <unk> never
>>>>> appears except as a unigram.  Therefore, Chris is not getting any gain
>>>>> from additional conditioning.
>>>>>
>>>>> OOVs can be good: names of people who appear in the news, new product
>>>>> names etc.
>>>>>
>>>>> On 03/19/11 14:02, Alexander Fraser wrote:
>>>>>> Hi Folks --
>>>>>>
>>>>>> An LM-OOV feature sounds like a good solution to me. Chris, have you
>>>>>> tried pegging the LM-OOV feature weight at an extremely high value? I
>>>>>> suspect the gains you are getting are due to the use of <unk> in LM
>>>>>> conditioning, i.e., p(word|... <unk> ...), rather than due to allowing
>>>>>> more LM-OOVs.
>>>>>>
>>>>>> If the LM-OOV feature were defaulted to an extremely high value, we
>>>>>> would get the behavior that Moses+SRILM has, but people who wanted to
>>>>>> could try training the weight.
>>>>>>
>>>>>> I think using an open-class LM without such a penalty is not a good
>>>>>> idea. I guess maybe the Moses+SRILM code defaults to a log probability
>>>>>> value of something like -20 for p(LM-OOV|any-context) regardless of
>>>>>> whether <unk> is present in the LM, so that is why it is OK to use an
>>>>>> open-class LM with SRILM.
>>>>>>
>>>>>> Cheers, Alex
>>>>>>
>>>>>>
>>>>>> On Sat, Mar 19, 2011 at 6:03 PM, Chris Dyer <[email protected]> wrote:
>>>>>>> I've started using an OOV feature (fires for each LM-OOV) together
>>>>>>> with an open-vocabulary LM, and found that this improves the BLEU
>>>>>>> score. Typically, the weight learned on the OOV feature (by MERT) is
>>>>>>> quite a bit more negative than the default amount estimated during LM
>>>>>>> training, but it is still far greater than the "avoid at all costs"
>>>>>>> moses/joshua OOV default behavior. As a result, there is a small
>>>>>>> increase in the number of OOVs in the output (I have not counted this
>>>>>>> number). However, the I find that the bleu score increases a bit for
>>>>>>> doing this (magnitude depends on a number of factors), and the "extra"
>>>>>>> OOVs typically occur in places where the possible English translation
>>>>>>> would have been completely nonsensical.
>>>>>>> -Chris
>>>>>>>
>>>>>>> On Sat, Mar 19, 2011 at 12:51 PM, Alexander Fraser
>>>>>>> <[email protected]> wrote:
>>>>>>>> Hi Folks,
>>>>>>>>
>>>>>>>> Is there some way to penalize LM-OOVs when using Moses+KenLM? I saw a
>>>>>>>> suggestion to create an open-vocab LM (I usually use closed-vocab) but
>>>>>>>> I think this means that in some context a LM-OOV could be produced in
>>>>>>>> preference to a non LM-OOV. This should not be the case in standard
>>>>>>>> phrase-based SMT (e.g., using the feature functions used in the Moses
>>>>>>>> baseline for the shared task for instance). Instead, Moses should
>>>>>>>> produce the minimal number of LM-OOVs possible.
>>>>>>>>
>>>>>>>> There are exceptions to this when using different feature functions.
>>>>>>>> For instance, we have a paper on trading off transliteration vs
>>>>>>>> semantic translation (for Hindi to Urdu translation), where the
>>>>>>>> transliterations are sometimes LM-OOV, but still a better choice than
>>>>>>>> available semantic translations (which are not LM-OOV). But the
>>>>>>>> overall SMT models we used supports this specific trade-off (and it
>>>>>>>> took work to make the models do this correctly, this is described in
>>>>>>>> the paper).
>>>>>>>>
>>>>>>>> I believe for the other three LM packages used with Moses the minimal
>>>>>>>> number of LM-OOVs is always produced. I've switched back to
>>>>>>>> Moses+SRILM for now due to this issue. I think it may be the case that
>>>>>>>> Moses+KenLM actually produces the maximal number of OOVs allowed by
>>>>>>>> the phrases loaded, which would be highly undesirable. Empirically, it
>>>>>>>> certainly produces more than Moses+SRILM in my experiments.
>>>>>>>>
>>>>>>>> Thanks and Cheers, Alex
>>>>>>>> _______________________________________________
>>>>>>>> Moses-support mailing list
>>>>>>>> [email protected]
>>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>>>>
>>>>>>>
>>>>>> _______________________________________________
>>>>>> Moses-support mailing list
>>>>>> [email protected]
>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>> _______________________________________________
>>>>> Moses-support mailing list
>>>>> [email protected]
>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>
>>>>
>>>> _______________________________________________
>>>> Moses-support mailing list
>>>> [email protected]
>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>> _______________________________________________
>>> Moses-support mailing list
>>> [email protected]
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> [email protected]
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to