On Tue, Nov 29, 2011 at 8:06 PM, Thomas Schoenemann
<[email protected]> wrote:
> Hi everyone!
>
> We all know that MERT gets slower in the later iterations. This is not
> surprising as the n-best lists of all previous iterations are merged. I
> believe this is quite important for translation performance.
>
> Still, it seems important to me to get the merged lists as small as
> possible. A quick inspection of mert/extractor indicates that duplicates are
> _not_ removed. Can anyone confirm this? And is this really not done anywhere
> else, e.g. in mert/mert ?
>
> Removing duplicates in the extractor should be easy to implement and I don't
> think it will take more running time than one gains from smaller list.
>
> Best,
>  Thomas (currently University of Pisa)

If I recall correctly, not removing duplicates is an intentional feature.

It's possible to get the same translation under different conditions
with different sets of feature values. You'd want MERT to know about
each of these possibilities, so you don't want to throw away the
duplicates.

Now, if you are getting lots of duplicates with exactly the same
feature values, that's a different story.

Cheers,
Lane

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to