Re: [Moses-support] Major bug found in Moses

amittai axelrod Fri, 19 Jun 2015 07:32:14 -0700

speaking of cobbling together a good translation from imperfect parts:

google:


A motorist heard on the radio the announcement: "Caution Caution On the 
N9 you will encounter a ghost driver Please drive far right and do not 
overtake!.!"
The driver: "What do you mean a dozens dozens?!""

microsoft:

"A motorist hears the announcement on the radio: 'warning! Caution! On 
the N9, a (s) satisfies you. Go quite right and not overtake!"
The car driver: "what do you mean one? Dozens! Dozens!"

:)
~amittai

On 6/19/15 10:19, Marcin Junczys-Dowmunt wrote:
> German joke:
>
> Ein Autofahrer hört im Radio die Durchsage: "Achtung! Achtung! Auf der
> N9 kommt Ihnen ein Geisterfahrer entgegen. Fahren Sie bitte ganz rechts
> und überholen Sie nicht!"
> Der Autofahrer: "Was heißt hier einer? Dutzende! Dutzende!"
>
> Wdniu 2015-06-19 16:12, Read, James C napisał(a):
>
>> So we've gone from
>>
>> 1) Acknowledging that the search algorithm performs poorly with no LM,
>> tuning or pruning despite the fact the search space clearly contains
>> high quality translations
>>
>> 2) to a public display of en-masse reluctance to acknowledge that such
>> is an undesirable quality of the system
>>
>> 3) to resorting to censorship not only in the literature but also on a
>> public mailing list rather than acknowledge point 2.
>>
>> And your conclusion is that after being a witness to such behaviour I
>> would still have a desire to contribute to this field?!? Why YES. I
>> would love to keep banging my head against a brick wall. I have no
>> other preferred past times.
>>
>> James
>>
>>
>>
>> ------------------------------------------------------------------------
>> *From:* Lane Schwartz <dowob...@gmail.com>
>> *Sent:* Friday, June 19, 2015 5:04 PM
>> *To:* Read, James C
>> *Cc:* Philipp Koehn; Burger, John D.; moses-support@mit.edu
>> *Subject:* Re: [Moses-support] Major bug found in Moses
>> James,
>> You may see the techniques that exist as outdated, wrong-headed, and
>> inefficient. You have the right to hold that opinion. It may even be
>> that history proves you right. Progress in science is made by people
>> posing questions - often questions that challenge the status quo - and
>> then doing experiments to answer those questions.
>> However, it is incumbent upon you, the proponent of a new idea, to
>> design good experiments to attempt to prove or disprove your new
>> hypothesis. Dispassionately showing the relative merits and
>> shortcomings of your technique with the existing state of the art is
>> part of that process.
>> I, along with numerous other people on this list, have attempted in
>> good faith to answer your questions, and to provide you with our
>> perspective based on our collective understanding of the problem.
>> You, in turn, have responded belligerently.
>> I suggest that you have a frank conversation with your academic
>> advisor or other appropriate mentor regarding your future. If you
>> intend to pursue a successful career in science, academia, government,
>> or industry, you would do well to reconsider the manner in which you
>> interact with other people, especially people with whom you disagree.
>> In the meantime, I would respectfully request that until you learn how
>> to respectfully interact with other adults that you refrain from
>> posting to this mailing list.
>> Sincerely,
>> Lane Schwartz
>>
>> On Fri, Jun 19, 2015 at 8:45 AM, Read, James C <jcr...@essex.ac.uk
>> <mailto:jcr...@essex.ac.uk>> wrote:
>>
>>     According to your book which I have on my desk the job of the TM
>>     is to model the most likely translations  and the job of
>>     the decoder is to intelligently search the space of translations
>>     to find the most likely one/s (I'm paraphrasing of course).
>>
>>     Would you like to retract that position and republish a next
>>     edition of your book which openly states that Moses when used with
>>     no LM or tuning or pruning can and should be expected to perform
>>     very poorly and select only the least likely translations?
>>
>>     Don't you in the slightest find it worrying that like at least 90%
>>     of you code base could be thrown out of the window and high
>>     scoring results can be obtained with a simple phrase pair based
>>     rule based system?
>>
>>     Which would you prefer? Would you prefer to consume computational
>>     resources calculating probabilites or get straight to the answer
>>     with simple logic and low computational requirements?
>>
>>     BE HONEST!
>>
>>     James
>>
>>
>>
>>     ------------------------------------------------------------------------
>>     *From:* moses-support-boun...@mit.edu
>>     <mailto:moses-support-boun...@mit.edu>
>>     <moses-support-boun...@mit.edu
>>     <mailto:moses-support-boun...@mit.edu>> on behalf of Philipp Koehn
>>     <p...@jhu.edu <mailto:p...@jhu.edu>>
>>     *Sent:* Thursday, June 18, 2015 9:39 PM
>>     *To:* Burger, John D.
>>     *Cc:* moses-support@mit.edu <mailto:moses-support@mit.edu>
>>
>>     *Subject:* Re: [Moses-support] Major bug found in Moses
>>     Hi,
>>     I am great fan of open source software, but there is a danger to
>>     view its inner workings as a black box - which leads to the
>>     strange theories of what is going on, instead of real understanding.
>>     But we can try to understand it.
>>     In the reported experiment, the language model was removed,
>>     while the rest of the system was left unchanged.
>>     The default untuned weights that train-model.perl assigns to a
>>     model are the following:
>>     WordPenalty0= -1
>>     PhrasePenalty0= 0.2
>>     TranslationModel0= 0.2 0.2 0.2 0.2
>>     Distortion0= 0.3
>>     Since no language model is used, a positive distortion cost will
>>     lead the decoder to not use any reordering at all. That's a
>>     good thing in this case.
>>     The word penalty is used to counteract the language model's
>>     preference for short translations. Unchecked, there is now a
>>     bias towards too long translations.
>>     Then there is the translation model with its equal weights for
>>     p(e|f) and p(f|e). The p(e|f) weight and scores are fine and well.
>>     However, p(f|e) only make sense if you have the Bayes theorem
>>     in your mind and a language model in your back. But in the
>>     reported setup, there is now a bias to translate into rare English
>>     phrases, since these will have high p(f|e) scores.
>>     My best guess is that the reported setup translates common
>>     function words (such as prepositions) into very long rare English
>>     phrases - word penalty likes it, p(f|e) likes it, p(e|f) does not mind
>>     enough - which produces a lot of rubbish.
>>     By filtering for p(e|f) those junky phrases are removed from the
>>     phrase table, restricting the decoder to more reasonable choices.
>>     I content that this is not a bug in the software, but a bug in usage.
>>     -phi
>>
>>     On Thu, Jun 18, 2015 at 11:32 AM, Burger, John D. <j...@mitre.org
>>     <mailto:j...@mitre.org>> wrote:
>>
>>         On Jun 17, 2015, at 11:54, Read, James C <jcr...@essex.ac.uk
>>         <mailto:jcr...@essex.ac.uk>> wrote:
>>
>>         > The question remains why isn't the system capable of finding
>>         the most likely translations without the LM?
>>
>>         Even if it weren't ill-posed, I don't find this to be an
>>         interesting question at all. This is like trying to improve
>>         automobile transmissions by disabling the steering. These are
>>         the parts we have, and they all work together.
>>
>>         It's not as if human translators don't use their own internal
>>         language models.
>>
>>         - John Burger
>>           MITRE
>>
>>         > Evidently, if you filter the phrase table then the LM is not
>>         as important as you might feel. The question remains why isn't
>>         the system capable of finding the most likely translations
>>         without the LM? Why do I need to filter to help the system
>>         find them? This is undesirable behaviour. Clearly a bug.
>>         >
>>         > I include the code I used for filtering. As you can see the
>>         4th score only was used as a filtering criteria.
>>         >
>>         > #!/usr/bin/perl -w
>>         > #
>>         > # Program filters phrase table to leave only phrase pairs
>>         > # with probability above a threshold
>>         > #
>>         > use strict;
>>         > use warnings;
>>         > use Getopt::Long;
>>         >
>>         > my $phrase;
>>         > my $min;
>>         > my $phrase_table;
>>         > my $filtered_table;
>>         >
>>         > GetOptions(     'min=f'         => \$min,
>>         >                 'out=s'         => \$filtered_table,
>>         >                 'in=s'          => \$phrase_table);
>>         > die "ERROR: must give threshold and phrase table input file
>>         and output file\n" unless ($min && $phrase_table &&
>>         $filtered_table);
>>         > die "ERROR: file $phrase_table does not exist\n" unless (-e
>>         $phrase_table);
>>         > open (PHRASETABLE, "<$phrase_table") or die "FATAL: Could
>>         not open phrase table $phrase_table\n";;
>>         > open (FILTEREDTABLE, ">$filtered_table") or die "FATAL:
>>         Could not open phrase table $filtered_table\n";;
>>         >
>>         > while (my $line = <PHRASETABLE>)
>>         > {
>>         >         chomp $line;
>>         >         my @columns = split ('\|\|\|', $line);
>>         >
>>         >         # check that file is a well formatted phrase table
>>         >         if (scalar @columns < 4)
>>         >         {
>>         >                 die "ERROR: input file is not a well
>>         formatted phrase table. A phrase table must have at least four
>>         colums each column separated by |||\n";
>>         >         }
>>         >
>>         >         # get the probability and check it is less than the
>>         threshold
>>         >         my @scores = split /\s+/, $columns[2];
>>         >         if ($scores[3] > $min)
>>         >         {
>>         >                 print FILTEREDTABLE $line."\n";;
>>         >         }
>>         > }
>>         >
>>         >
>>         >
>>         > From: Matt Post <p...@cs.jhu.edu <mailto:p...@cs.jhu.edu>>
>>         > Sent: Wednesday, June 17, 2015 5:25 PM
>>         > To: Read, James C
>>         > Cc: Marcin Junczys-Dowmunt; moses-support@mit.edu
>>         <mailto:moses-support@mit.edu>; Arnold, Doug
>>         > Subject: Re: [Moses-support] Major bug found in Moses
>>         >
>>         > I think you are misunderstanding how decoding works. The
>>         highest-weighted translation of each source phrase is not
>>         necessarily the one with the best BLEU score. This is why the
>>         decoder retains many options, so that it can search among them
>>         (together with their reorderings). The LM is an important
>>         component in making these selections.
>>         >
>>         > Also, how did you weight the many probabilities attached to
>>         each phrase (to determine which was the most probable)? The
>>         tuning phase of decoding selects weights designed to optimize
>>         BLEU score. If you weighted them evenly, that is going to
>>         exacerbate this experiment.
>>         >
>>         > matt
>>         >
>>         >
>>         >
>>         >> On Jun 17, 2015, at 10:22 AM, Read, James C
>>         <jcr...@essex.ac.uk <mailto:jcr...@essex.ac.uk>> wrote:
>>         >>
>>         >> All I did was break the link to the language model and then
>>         perform filtering. How is that a methodoligical mistake? How
>>         else would one test the efficacy of the TM in isolation?
>>         >>
>>         >> I remain convinced that this is undersirable behaviour and
>>         therefore a bug.
>>         >>
>>         >> James
>>         >>
>>         >>
>>         >> From: Marcin Junczys-Dowmunt <junc...@amu.edu.pl 
>> <mailto:junc...@amu.edu.pl>>
>>         >> Sent: Wednesday, June 17, 2015 5:12 PM
>>         >> To: Read, James C
>>         >> Cc: Arnold, Doug; moses-support@mit.edu
>>         <mailto:moses-support@mit.edu>
>>         >> Subject: Re: [Moses-support] Major bug found in Moses
>>         >>
>>         >> Hi James
>>         >> No, not at all. I would say that is expected behaviour.
>>         It's how search spaces and optimization works. If anything
>>         these are methodological mistakes on your side, sorry.  You
>>         are doing weird thinds to the decoder and then you are
>>         surprised to get weird results from it.
>>         >> W dniu 2015-06-17 16:07, Read, James C napisał(a):
>>         >>>
>>         >>> So, do we agree that this is undersirable behaviour and
>>         therefore a bug?
>>         >>>
>>         >>> James
>>         >>>
>>         >>> From: Marcin Junczys-Dowmunt <junc...@amu.edu.pl 
>> <mailto:junc...@amu.edu.pl>>
>>         >>> Sent: Wednesday, June 17, 2015 5:01 PM
>>         >>> To: Read, James C
>>         >>> Subject: Re: [Moses-support] Major bug found in Moses
>>         >>>
>>         >>> As I said. With an unpruned phrase table and an decoder
>>         that just optmizes some unreasonble set of weights all bets
>>         are off, so if you get very low BLEU point there, it's not
>>         surprising. It's probably jumping around in a very weird
>>         search space. With a pruned phrase table you restrict the
>>         search space VERY strongly. Nearly everything that will be
>>         produced is a half-decent translation. So yes, I can imagine
>>         that would happen.
>>         >>> Marcin
>>         >>> W dniu 2015-06-17 15:56, Read, James C napisał(a):
>>         >>> You would expect an improvement of 37 BLEU points?
>>         >>>
>>         >>> James
>>         >>>
>>         >>>
>>         >>> From: Marcin Junczys-Dowmunt <junc...@amu.edu.pl
>>         <mailto:junc...@amu.edu.pl>>
>>         >>> Sent: Wednesday, June 17, 2015 4:32 PM
>>         >>> To: Read, James C
>>         >>> Cc: Moses-support@mit.edu <mailto:Moses-support@mit.edu>;
>>         Arnold, Doug
>>         >>> Subject: Re: [Moses-support] Major bug found in Moses
>>         >>>
>>         >>> Hi James,
>>         >>> there are many more factors involved than just
>>         probability, for instance word penalties, phrase penalities
>>         etc. To be able to validate your own claim you would need to
>>         set weights for all those non-probabilities to zero. Otherwise
>>         there is no hope that moses will produce anything similar to
>>         the most probable translation. And based on that there is no
>>         surprise that there may be different translations. A pruned
>>         phrase table will produce naturally less noise, so I would say
>>         the behaviour you describe is quite exactly what I would
>>         expect to happen.
>>         >>> Best,
>>         >>> Marcin
>>         >>> W dniu 2015-06-17 15:26, Read, James C napisał(a):
>>         >>> Hi all,
>>         >>>
>>         >>> I tried unsuccessfully to publish experiments showing this
>>         bug in Moses behaviour. As a result I have lost interest in
>>         attempting to have my work published. Nonetheless I think you
>>         all should be aware of an anomaly in Moses' behaviour which I
>>         have thoroughly exposed and should be easy enough for you to
>>         reproduce.
>>         >>>
>>         >>> As I understand it the TM logic of Moses should select the
>>         most likely translations according to the TM. I would
>>         therefore expect a run of Moses with no LM to find sentences
>>         which are the most likely or at least close to the most likely
>>         according to the TM.
>>         >>>
>>         >>> To test this behaviour I performed two runs of Moses. One
>>         with an unfiltered phrase table the other with a filtered
>>         phrase table which left only the most likely phrase pair for
>>         each source language phrase. The results were truly startling.
>>         I observed huge differences in BLEU score. The filtered phrase
>>         tables produced much higher BLEU scores. The beam size used
>>         was the default width of 100. I would not have been surprised
>>         in the differences in BLEU scores where minimal but they were
>>         quite high.
>>         >>>
>>         >>> I have been unable to find a logical explanation for this
>>         behaviour other than to conclude that there must be some kind
>>         of bug in Moses which causes a TM only run of Moses to perform
>>         poorly in finding the most likely translations according to
>>         the TM when there are less likely phrase pairs included in the
>>         race.
>>         >>>
>>         >>> I hope this information will be useful to the Moses
>>         community and that the cause of the behaviour can be found and
>>         rectified.
>>         >>>
>>         >>> James
>>         >>>
>>         >>> _______________________________________________
>>         >>> Moses-support mailing list
>>         >>>
>>         >>> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>>         >>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>         <http://mailman.mit.edu/mailman/listinfo/moses-support>
>>         >>>
>>         >>>
>>         >>>
>>         >>>
>>         >>
>>         >>
>>         >> _______________________________________________
>>         >> Moses-support mailing list
>>         >> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>>         >> http://mailman.mit.edu/mailman/listinfo/moses-support
>>         <http://mailman.mit.edu/mailman/listinfo/moses-support>
>>         >
>>         > _______________________________________________
>>         > Moses-support mailing list
>>         > Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>>         > http://mailman.mit.edu/mailman/listinfo/moses-support
>>         <http://mailman.mit.edu/mailman/listinfo/moses-support>
>>
>>
>>         _______________________________________________
>>         Moses-support mailing list
>>         Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>>         http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>>     _______________________________________________
>>     Moses-support mailing list
>>     Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>>     http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>>
>> --
>> When a place gets crowded enough to require ID's, social collapse is not
>> far away.  It is time to go elsewhere.  The best thing about space travel
>> is that it made it possible to go elsewhere.
>>                 -- R.A. Heinlein, "Time Enough For Love"
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Major bug found in Moses

Reply via email to