[Moses-support] Filtering of very frequent sentence pairs

Micha Jellinghaus Tue, 21 Sep 2010 01:14:44 -0700

Hi,

We are currently developing an MT system based on parallel data
containing some sentence pairs that are repeated very often, with one
of them appearing over 16000 times. While a certain repetition of
identical sentence pairs probably makes sense as they generally
indicate reliable translations and should thus be assigned higher
probabilities, there is possibly a threshold above which adding even
more occurences is no longer meaningful.


Does anyone have any advice on how we could establish such a
threshold? Maybe a proportionate reduction would work best? Has anyone
experimented with filtering corpora in this way? Or is reducing the
occurences of very frequent sentence pairs a bad idea after all?

Looking forward to your replies,
Micha
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

[Moses-support] Filtering of very frequent sentence pairs

Reply via email to