Re: [Moses-support] Penalizing unknown words during bilingual scoring

Miles Osborne Mon, 25 Feb 2008 02:17:27 -0800

Interesting idea there.  I think you will find that dealing with unknown
words is outside of what the current version of factored translation can do,
since it assumes a one-to-one mapping between words and factors.


(Or, if you have unknown words then you will have corresponding gaps in the
factors)

The most direct way to deal with unknown (source) words would be to have a
new feature function which counts them --a score of zero would mean that
there were no unknown words (hurrah!) and the higher the score, the more
unknown words there will be.  The associated lambda would then connect that
with translation quality.

For this to work, you'd need to make sure the tuning set contained unknown
words, obviously.

Thinking some more about this, it might not give translation improvements,
since for a given source sentence, all competing targets will have the same
number of unknown source words.  What it might do however is indirectly
weigh those development sentences which only partially contributed towards
the overall set of lambdas.

Miles

On 25/02/2008, André Lynum <[EMAIL PROTECTED]> wrote:
>
> Hi, I'm working on modifying Moses to provide translation model scores
> for a given source translation sentence pair.
>
>
> I'm using the decoder, constraining the hypothesises it generates, and
> then I examine the hypothesis stack which covers the most of the
> source input (I'm wondering if I should look at all generated
> hypothesises). Here I look for the highest scoring hypothesis, but I
> will need to account for the parts of the source and translation that
> is not covered by the hypothesis.
>
> I was thinking of adding a penalizing factor to the total score of the
> hypothesises for each unknown word in the input pair. This is
> motivated by the notion that each unknown word may be generated from a
> "null" phrase pair. But I'm unsure about what factor to use and the
> scoring part of the Moses code is a bit complicated and I would
> appreciate any insight in what factor to use and how to apply it to
> the hypothesis score.
>
> My initial notion was to use the same score as is used for translation
> options generated for unknown word but I can't quite see where this is
> set (as the negative word weight in TargetPhrase::SetScore() or as -
> inf in the translationoption constructor ?). Any help that would help
> me understand this part of the code would be greatly appreciated.
>
>
> Regards
>
>
>
> -andré lynum
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>


-- 
The University of Edinburgh is a charitable body, registered in Scotland,
with registration number SC005336.

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Penalizing unknown words during bilingual scoring

Reply via email to