Hi Xianhua Good point about the evaluation. I don't have an easy answer, but these two papers might help
http://www.aclweb.org/anthology-new/P/P11/P11-1103.pdf http://www.aclweb.org/anthology-new/W/W11/W11-2102.pdf cheers - Barry On Tuesday 01 November 2011 09:37:18 Li Xianhua wrote: > Hi Barry, > > Thank you very much. I have run three times on my baseline and the > variation is 0.11 BLEU-4 score on dev set, 0.28 on test set. I think I > should do more experiments. By the way, as the problem I'm focusing on is > the reordering problem in Japanese- Chinese translation, I'm afraid BLEU > won't reflect the differences as it uses n gram. In other words, I 'm > afraid it is not so sensitive to reordering. So, which evaluation tool > would you suggest in this case? > > ---------------------------------------------------- > Best wishes! > Xianhua Li > Information Technology Laboratory > Fujitsu Research & Development Center Co.,LTD. > 13F Tower A, Ocean International Center, > No.56 Dong Si Huan Zhong Rd, Chaoyang District, Beijing, China ,100025 > E-mail:[email protected] > > > -----邮件原件----- > 发件人: Barry Haddow [mailto:[email protected]] > 发送时间: 2011年11月1日 17:17 > 收件人: Li Xianhua > 抄送: [email protected] > 主题: Re: 答复: [Moses-support] mert-moses.pl script > > Hi Xianhua > > > that to reduce the server burden? So the question is like, will the > > improvement we get while run deterministic mert still be observed > > while run undeterministic mert? > > It depends how large the improvement is. If you run (underterministic) mert > a few times on your baseline then you will see how much variation there > is, and that will tell you (roughly) how big your improvement needs to be > before you can be confident about it, > > cheers - Barry > > On Tuesday 01 November 2011 09:06:47 Li Xianhua wrote: > > Hi Barry, > > > > Thanks for your suggestion. I am doing Japanese-Chinese machine > > translation, solving the reordering problem. I want to make sure > > whether my method improves the results. Our server is not strong > > enough and several people run experiments on it. So I am wondering, > > can I first run deterministic mert to make sure the method is > > effective, then run undeterministic mert several times and average > > them to get the results, so that to reduce the server burden? So the > > question is like, will the improvement we get while run deterministic > > mert still be observed while run undeterministic mert? > > > > ---------------------------------------------------- > > Best wishes! > > Xianhua Li > > Information Technology Laboratory > > Fujitsu Research & Development Center Co.,LTD. > > 13F Tower A, Ocean International Center, > > No.56 Dong Si Huan Zhong Rd, Chaoyang District, Beijing, China ,100025 > > E-mail:[email protected] > > > > > > -----邮件原件----- > > 发件人: Barry Haddow [mailto:[email protected]] > > 发送时间: 2011年11月1日 16:51 > > 收件人: [email protected] > > 抄送: Li Xianhua > > 主题: Re: [Moses-support] mert-moses.pl script > > > > Hi Xianhua > > > > Way 1 uses no random restarts at all, so each iteration starts where > > the last one left off, which is why the results are always the same. > > The other methods both use random restarts (which make them different > > from Way 1) and they each use fixed seeds (so they are deterministic) > > but the seeds are different (so they are different from each other). > > > > My guess is that Ways 2 & 3 will give better results as they explore a > > larger part of the parameter space. However the best method is to > > follow Clark et al. and run mert several times and average. This > > advice also applies to most other methods of tuning MT systems, in > > particular online algorithms where the order of the data may matter. > > > > cheers - Barry > > > > On Tuesday 01 November 2011 01:42:26 Li Xianhua wrote: > > > Hi all, > > > > > > About the moses mert problem Neda mentioned, I tried three ways to > > > make MERT deterministic, but their results between them are different. > > > Way 1: edit mert-moses.pl line 105: "my $_RANDOM-RESTARTS=20", > > > change 20 to 0 > > > Way 2: switch to the mert-moses.pl call: --mertargs=" -r $seed " as > > > Patrik said > > > Way 3: activate flag "--predictable-seed", as Nicola and Barry said > > > > > > I ran totally 9 experiments, 3 for each way with the same corpus > > > and parameters, and compared the results. the results in the same > > > way are the same, but the results with the same corpus but different > > > ways are different. So my questions are: > > > Are the three ways right to make MERT deterministic? Which is best? > > > Why are they different? > > > > > > ---------------------------------------------------- > > > Best wishes! > > > Xianhua Li > > > Information Technology Laboratory > > > Fujitsu Research & Development Center Co.,LTD. > > > 13F Tower A, Ocean International Center, > > > No.56 Dong Si Huan Zhong Rd, Chaoyang District, Beijing, China > > > ,100025 E-mail:[email protected] > > > > > > > > > > > > _______________________________________________ > > > Moses-support mailing list > > > [email protected] > > > http://mailman.mit.edu/mailman/listinfo/moses-support > > > > -- > > Barry Haddow > > University of Edinburgh > > +44 (0) 131 651 3173 > > > > -- > > The University of Edinburgh is a charitable body, registered in > > Scotland, with registration number SC005336. > > -- > Barry Haddow > University of Edinburgh > +44 (0) 131 651 3173 > > -- > The University of Edinburgh is a charitable body, registered in Scotland, > with registration number SC005336. > -- Barry Haddow University of Edinburgh +44 (0) 131 651 3173 -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
