On 12/29/2011 10:57 AM, Bill_Lang(Gmail) wrote: > Hi Moses Friends, > These days, I am running phrase based and hierarchical based > moses. Here my training corpus is FBIS (240k sentence pairs) for > Chinese to English translation. My moses version is updated on Dec 19, > 2011. After training I used NIST02, NIST03, NIST05 for tunning, > respectively. Here I got weird BLEUs as follows. > > Phrase Based Tunning on NIST02: NIST02 0.3176, NIST03 0.2827, NIST05 > 0.2761 > Phrase Based Tunning on NIST03: NIST02 0.3141, NIST03 0.2861, NIST05 > 0.2746 > Phrase Based Tunning on NIST05: NIST02 0.3109, NIST03 0.2831, NIST05 > 0.2822 > > Hierarchical Tunning on NIST02: NIST02 0.3403, NIST03 0.1620, NIST05 > 0.1577 > Hierarchical Tunning on NIST03: NIST02 0.3259, NIST03 0.1732, NIST05 > 0.1669 > Hierarchical Tunning on NIST05: NIST02 0.3286, NIST03 0.1689, NIST05 > 0.1678 > > I feel it is usual on Phrase based training, running, and testing. > Meanwhile, on Hierarchical, it is usual on NIST02. But it is so weird > on NIST03 and NIST05. The most strange thing, I think, is that running > on NIST03 or NIST05, the NIST02 BLEU is also usual.
If I understand correctly your numbers, your hiero system gets good performance on NIST02, but not on the other data sets. In the past, I had some problems with filtering the rule-table on the test data, the script combine_factors.pl crashed when there are multiple spaces. The result was that many rules were missing and the output was badly translated. You may want to check the log-file, or whether your output has an usual large number of untranslated Chinese. hope this helps, Holger _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
