Re: [Moses-support] The translation quality problem from English to Chinese

Barry Haddow Fri, 13 Apr 2012 04:06:17 -0700

Hi Moonloki

What does the translation output look like? (look at run10.out).  Are the 
outputs excessively long or short, or do they contain a lot of untranslated 
English words, or are there any other obvious problems? You can't rely on bleu 
alone.


Make sure that all the Chinese data you use is segmented the same way 
(including the data used to train the language model), and all the English 
data is tokenised and cased the same way.

> Do I need to
> restrict the training sentence length or change the training Corpus for LM
> in order to get much higher BLEU score? 

Restricting the training set would most likely give you a lower score.

cheers - Barry


On Friday 13 April 2012 11:46:45 Loki Cheng wrote:
> Hi, barry
> Before executing the script file "moses-mert.pl" , I have corrected the
> content in "moses.ini" file accroding to this post:
> http://thread.gmane.org/gmane.comp.nlp.moses.user/6407/focus=6409
> 
> So I'm sure the LM file is loaded durning the tunning stage, and my
> question is why the BLEU score is so low in the log files? Do I need to
> restrict the training sentence length or change the training Corpus for LM
> in order to get much higher BLEU score?  It's a so strange problem.
> 
> Best regards
> Moonloki
> 2012/4/13 Barry Haddow <[email protected]>
> 
> > Hi Moonloki
> >
> > I noticed this in your train-model.perl arguments
> >
> > > --lm 1:5:/home/loki/Downloads/irstlm-5.70.04/scripts/train.irstlm.gz
> >
> > This means that your language model is applied to the 1st factor (rather
> > than
> > the 0th), but it looks as though you don't have a 1st factor. So you're
> > effectively running without a language model.
> >
> > The correct argument (apply LM to 0th i.e. surface factor, use irstlm) is
> >
> > --lm 0:5:/home/loki/Downloads/irstlm-5.70.04/scripts/train.irstlm.gz:1
> >
> > If you rerun train-model.perl with --first-step 9 then it should fix your
> > ini
> > file, or you can just fix it manually so that the LM line reads:
> >
> > 1 0 5 /path/to/lm
> >
> > cheers - Barry
> >
> > On Friday 13 April 2012 07:30:12 Loki Cheng wrote:
> > > Hi, everyone, I finally finished the tunning stage with the script file
> > > " moses-mert.pl" on the development corpus, but the BLEU score in
> > > run*.mert.log files is very low as listed below:
> > > ======================================
> > > ==> run1.mert.log <==
> > > Best point: 0.132846 0.0467879 -0.644693 0.0516577 -0.0326 0.0526244
> > > 0.0383472 -0.000443235  => 0.0478053
> > > Stopping... : [27] seconds
> > >
> > > ==> run10.mert.log <==
> > > Best point: 0.130789 0.0831878 -0.49309 0.032857 0.018328 0.0829243
> > > 0.0663681 0.0924561  => 0.0546437
> > > Stopping... : [293] seconds
> > >
> > > ==> run11.mert.log <==
> > > Best point: 0.130738 0.0831553 -0.492897 0.0332344 0.0183208 0.0828919
> > > 0.0663422 0.09242  => 0.0546934
> > > Stopping... : [286] seconds
> > >
> > > ==> run12.mert.log <==
> > > Best point: 0.130136 0.0827725 -0.490628 0.0330814 0.0182365 0.0825104
> > > 0.0660368 0.0965979  => 0.0547353
> > > Stopping... : [328] seconds
> > >
> > > ==> run13.mert.log <==
> > > Best point: 0.130136 0.0827725 -0.490628 0.0330814 0.0182365 0.0825104
> > > 0.0660368 0.0965979  => 0.0547353
> > > Stopping... : [334] seconds
> > >
> > > ==> run2.mert.log <==
> > > Best point: 0.158121 0.0928362 -0.507586 0.00834967 0.0747225 0.0630188
> > > 0.0422052 0.053161  => 0.0476075
> > > Stopping... : [57] seconds
> > >
> > > ==> run3.mert.log <==
> > > Best point: 0.0743158 0.113324 -0.521279 0.0131133 0.0846289 0.0769264
> > > 0.0515195 0.0648931  => 0.048868
> > > Stopping... : [87] seconds
> > >
> > > ==> run4.mert.log <==
> > > Best point: 0.115927 0.142508 -0.450438 0.0394533 0.10284 0.0641182
> > > 0.0511402 0.0335745  => 0.0489438
> > > Stopping... : [96] seconds
> > >
> > > ==> run5.mert.log <==
> > > Best point: 0.126696 0.103291 -0.473338 0.0321721 0.0495814 0.072928
> > > 0.0589201 0.0830735  => 0.0508062
> > > Stopping... : [123] seconds
> > >
> > > ==> run6.mert.log <==
> > > Best point: 0.127446 0.0971059 -0.453352 0.0345673 0.0387205 0.0944486
> > > 0.0538136 0.100546  => 0.052208
> > > Stopping... : [193] seconds
> > >
> > > ==> run7.mert.log <==
> > > Best point: 0.145001 0.0566422 -0.489566 0.0468676 0.0240972 0.067952
> > > 0.0820283 0.0878453  => 0.0535599
> > > Stopping... : [181] seconds
> > >
> > > ==> run8.mert.log <==
> > > Best point: 0.142602 0.0884247 -0.409176 0.0736586 0.0130789 0.113885
> > > 0.052161 0.107014  => 0.053494
> > > Stopping... : [191] seconds
> > >
> > > ==> run9.mert.log <==
> > > Best point: 0.136603 0.0804459 -0.493247 0.0328674 0.0183338 0.0788154
> > > 0.067202 0.0924855  => 0.0537308
> > > Stopping... : [272] seconds
> > > ======================================
> > > My experiment setting:
> > >
> > > *Language model:* 1~5-gram of Chinese (Target language)
> > > *Training corpus: *
> > > Multi-UN consists about 8.7 million Chinese-English sentence pairs
> > > whose length are between 1~100
> > > *Development corpus: *
> > > 932 Chinese sentences that have been segmented, 932 English sentences
> >
> > that
> >
> > > have been tokenized
> > > *Chinese segmentation tool:* ICTLAS
> > > *English tokenizer:* tokenizer.perl
> > >
> > > I ran the training phrases with the following command:
> > > ./train-model.perl --parts 7 --mgiza --mgiza-cpus 2 --parallel
> > > --scripts-root-dir $SCRIPTS_ROOTDIR --corpus
> >
> > /home/loki/Downloads/moses/scripts/target/scripts-20120222-0301/training/
> >tr
> >
> > > aining_corpus/clean --f eng --e chn --alignment intersect --lm
> > > 1:5:/home/loki/Downloads/irstlm-5.70.04/scripts/train.irstlm.gz
> > >
> > > I wonder the low BLEU score is due to the lack of the switch
> >
> > '--reordering
> >
> > > msd-bidirectional-fe' since the moses used the default distance-based
> > > reordering model that is fairly weak or the length of sentences are too
> > > long.
> > > And I don't know if there exist a paper related to the translation of
> > > English-to-Chinese. If there do exist, then I want to refer to its BLEU
> > > score result.
> > >
> > > Any suggestion will be appreciated
> > > Best regards
> > > Moonloki
> >
> > --
> > Barry Haddow
> > University of Edinburgh
> > +44 (0) 131 651 3173
> >
> > --
> > The University of Edinburgh is a charitable body, registered in
> > Scotland, with registration number SC005336.
> 
 
--
Barry Haddow
University of Edinburgh
+44 (0) 131 651 3173

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] The translation quality problem from English to Chinese

Reply via email to