Hi! Well, it doesn't have to be the same target translation, just the exact same score vector (and the same feature vector, of course). I agree that MERT is working correctly, my mail was always about the efficiency. In my experiments MERT took the major part of the running time, and I believe others have the same problem. So I care about getting it faster.
If you don't plan on an implementation, I will write my own (i.e. modify the existing one). I can then get back to you once I know it is working (and faster). Concerning pro-mode, I think that behavior could be simulated by storing weights for each list-entry. It would be a little more complicated to implement, though. Cheers, Thomas ________________________________ Von: Barry Haddow <bhad...@staffmail.ed.ac.uk> An: moses-support@mit.edu; Thomas Schoenemann <thomas_schoenem...@yahoo.de> Gesendet: 12:35 Mittwoch, 30.November 2011 Betreff: Re: [Moses-support] Removing duplicates when merging nbest lists for MERT Hi Thomas Yes, you're correct, mert doesn't remove duplicates in the nbest lists. It's something that we intended to do (and probably mentioned in the mert paper) but somehow never got around to it. As Lane pointed out, you have to be careful to do the duplicate removal correctly. You can only consider hypotheses to be duplicates if they have the same target text, and the same feature values. The mert optimisation actually does duplicate removal implicitly, during the optimisation, since duplicate hypotheses contribute the same line to the envelope. However removing duplicates in the extractor could potentially be more efficient. For pro however, duplicates could make a difference to the optimisation, as they affect the sampling. I recently re-implemented the pro extraction to make it more efficient, and again did intend to do de-duping, but haven't got around to it yet. It would be interesting to know if de-duping makes a difference to the outcome. cheers - Barry On Tuesday 29 Nov 2011 20:06:20 Thomas Schoenemann wrote: > Hi everyone! > > We all know that MERT gets slower in the later iterations. This is not > surprising as the n-best lists of all previous iterations are merged. I > believe this is quite important for translation performance. > > Still, it seems important to me to get the merged lists as small as > possible. A quick inspection of mert/extractor indicates that duplicates > are _not_ removed. Can anyone confirm this? And is this really not done > anywhere else, e.g. in mert/mert ? > > Removing duplicates in the extractor should be easy to implement and I > don't think it will take more running time than one gains from smaller > list. > > Best, > Thomas (currently University of Pisa) >
_______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support