Re: [Moses-support] Major bug found in Moses

Read, James C Wed, 17 Jun 2015 11:40:34 -0700

1) So if I've understood you correctly you are saying we have a system that is 
purposefully designed to perform poorly with a disabled LM and this is the 
proof that the LM is the most fundamental part. Any attempt to prove otherwise 
by, e.g. filtering the phrase table to help the disfunctional search algorithm, 
does not constitute proof that the TM is the most fundamental component of the 
system and if designed correctly can perform just fine on its own but rather 
only evidence that the researcher is not using the system as intended (the 
intention being to break the TM to support the idea that the LM is the most 
fundamental part).


2) If you still feel that the LM is the most fundamental component I challenge 
you to disable the TM and perform LM only translations and see what kind of 
BLEU scores you get.

In conclusion, I do hope that you don't feel that potential investors in MT 
systems lack the intelligence to see through these logical fallacies. Can we 
now just admit that the system is broke and get around to fixing it?

James

________________________________
From: Marcin Junczys-Dowmunt <junc...@amu.edu.pl>
Sent: Wednesday, June 17, 2015 5:29 PM
To: Read, James C
Cc: Arnold, Doug; moses-support@mit.edu
Subject: Re: [Moses-support] Major bug found in Moses


To paint you a picture:

Imagine you have a rat in a labyrinth (the labyrinth is the TM and the search 
space). That rat is quite good at finding the center of that labyrinth. Now you 
somehow disable that rat's sense of smell, sense of direction, and long-term 
short-term memory (that's the LM). Can you expect the rat to find the center? 
Or will it just tumble around, bumping into walls and not find anything? That's 
what you did to the decoder when disabling the LM.

Now you prune the TM. In the labyrinth that's like closing all the doors that 
would lead the rat away from the center. There are still a few corridors left, 
but they all point into the general direction of the point where the rat is 
supposed to go. Although it may never quite reach it. Now you put that same 
handicapped rat into the labyrinth where all ways lead more or less to the 
center. Are you really surprised that the clueless rat find the center nearly 
every time now?

That's what happend. It's not a bug. The LM is probably the strongest feature 
in a MT system. If you take that away you see what happens.

W dniu 2015-06-17 16:22, Read, James C napisał(a):

All I did was break the link to the language model and then perform filtering. 
How is that a methodoligical mistake? How else would one test the efficacy of 
the TM in isolation?



I remain convinced that this is undersirable behaviour and therefore a bug.



James


________________________________
From: Marcin Junczys-Dowmunt <junc...@amu.edu.pl>
Sent: Wednesday, June 17, 2015 5:12 PM
To: Read, James C
Cc: Arnold, Doug; moses-support@mit.edu
Subject: Re: [Moses-support] Major bug found in Moses


Hi James

No, not at all. I would say that is expected behaviour. It's how search spaces 
and optimization works. If anything these are methodological mistakes on your 
side, sorry.  You are doing weird thinds to the decoder and then you are 
surprised to get weird results from it.

W dniu 2015-06-17 16:07, Read, James C napisał(a):



So, do we agree that this is undersirable behaviour and therefore a bug?

James

________________________________
From: Marcin Junczys-Dowmunt <junc...@amu.edu.pl>
Sent: Wednesday, June 17, 2015 5:01 PM
To: Read, James C
Subject: Re: [Moses-support] Major bug found in Moses


As I said. With an unpruned phrase table and an decoder that just optmizes some 
unreasonble set of weights all bets are off, so if you get very low BLEU point 
there, it's not surprising. It's probably jumping around in a very weird search 
space. With a pruned phrase table you restrict the search space VERY strongly. 
Nearly everything that will be produced is a half-decent translation. So yes, I 
can imagine that would happen.

Marcin

W dniu 2015-06-17 15:56, Read, James C napisał(a):

You would expect an improvement of 37 BLEU points?



James


________________________________
From: Marcin Junczys-Dowmunt <junc...@amu.edu.pl>
Sent: Wednesday, June 17, 2015 4:32 PM
To: Read, James C
Cc: Moses-support@mit.edu; Arnold, Doug
Subject: Re: [Moses-support] Major bug found in Moses


Hi James,

there are many more factors involved than just probability, for instance word 
penalties, phrase penalities etc. To be able to validate your own claim you 
would need to set weights for all those non-probabilities to zero. Otherwise 
there is no hope that moses will produce anything similar to the most probable 
translation. And based on that there is no surprise that there may be different 
translations. A pruned phrase table will produce naturally less noise, so I 
would say the behaviour you describe is quite exactly what I would expect to 
happen.

Best,

Marcin

W dniu 2015-06-17 15:26, Read, James C napisał(a):

Hi all,



I tried unsuccessfully to publish experiments showing this bug in Moses 
behaviour. As a result I have lost interest in attempting to have my work 
published. Nonetheless I think you all should be aware of an anomaly in Moses' 
behaviour which I have thoroughly exposed and should be easy enough for you to 
reproduce.



As I understand it the TM logic of Moses should select the most likely 
translations according to the TM. I would therefore expect a run of Moses with 
no LM to find sentences which are the most likely or at least close to the most 
likely according to the TM.



To test this behaviour I performed two runs of Moses. One with an unfiltered 
phrase table the other with a filtered phrase table which left only the most 
likely phrase pair for each source language phrase. The results were truly 
startling. I observed huge differences in BLEU score. The filtered phrase 
tables produced much higher BLEU scores. The beam size used was the default 
width of 100. I would not have been surprised in the differences in BLEU scores 
where minimal but they were quite high.



I have been unable to find a logical explanation for this behaviour other than 
to conclude that there must be some kind of bug in Moses which causes a TM only 
run of Moses to perform poorly in finding the most likely translations 
according to the TM when there are less likely phrase pairs included in the 
race.



I hope this information will be useful to the Moses community and that the 
cause of the behaviour can be found and rectified.



James


_______________________________________________
Moses-support mailing list
Moses-support@mit.edu<mailto:Moses-support@mit.edu>
http://mailman.mit.edu/mailman/listinfo/moses-support

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Major bug found in Moses

Reply via email to