Hi Vincent,

On Thu, 2015-09-24 at 22:37 +0200, Vincent Nguyen wrote:
> Thanks Matthias for the detailed explanation.
> I think I have most of it in mind except not really understanding how 
> this one works :
> 
> "Difficult sentences generally have worse model score than easy ones but
> may still be useful for training."

Well, your data selection method may discard training instances that are
somehow hard to decode, e.g. because of complex sentence structure or
because of rare vocabulary. But that doesn't necessarily mean that it's
bad sentence pairs that you're removing. You should manually inspect
some samples if possible.

I didn't try, but I suspect that you'd get a higher decoder score on the
1-best decoder output of the first of the following two input sentences:

(1) " Merci ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! "
(2) " Je l' ai vécu moi-même en personne quand j' ai eu mon diplôme à Barnard 
College en 2002 . "

(Just as a simple made-up example.)

If we assume that you have a correct English target sentence for both of
those sentences in your training data, I wonder which of the two you
could learn more from?

If you're doing what I think, then you're also basically just assessing
whether the source side of the sentence pair is easy to translate. Does
this tell you anything about the target sentence? The target side might
be misaligned or in a different third language if your data is noisy.

Cheers,
Matthias



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to