Hi Barry, I solved, the source file to be correctly wrapped must be in utf-8 whitout bom format and with no indent spaces.
Mauro On Fri, Apr 13, 2012 at 10:38 AM, Barry Haddow <[email protected]>wrote: > Hi Mauro > > It's the EVALUATION_testset_wrap stage that generates this file, and it > runs > wrap-xml.perl (found in EMS). I'm not really sure why the wrapping script > is > failing to produce correct output. I guess there must be an error in your > source sgm file, > > cheers - Barry > > On Friday 13 April 2012 09:17:35 Mauro Zanotti wrote: > > Hi Barry, > > > > I found the problem: the testset file generated > (testset.detokenized.sgm.9) > > has a wrong structure (the start tag is srcset instead of tstset, > properly > > inserted at the end of the file) and 2 missing tag (trglang and sysid), . > > What's the process that creates this file? > > > > Thank you > > Mauro > > > > > > <srcset setid="testiw" srclang="any"> > > <doc docid="doc" genre="testset" origlang="en"> > > <seg id="1">Attenzione: ... > > ... > > <seg id="266">Se la compagnia ...</seg> > > </doc> > > </tstset> > > > > On Thu, Apr 12, 2012 at 5:11 PM, Barry Haddow > <[email protected]>wrote: > > > Hi Mauro > > > > > > > Could someone tell me where the problems is? Could someone send me a > > > > well-formed sgm file that I can adopt my sgm test set to? > > > > > > The files released for wmt12 will work with the nist script. See > > > http://www.statmt.org/wmt12/translation-task.html > > > and look for the development sets, as they come with references. > > > These should show which tags are required. > > > > > > cheers - Barry > > > > > > On Thursday 12 April 2012 11:39:31 Mauro Zanotti wrote: > > > > Dear all, > > > > > > > > I ran an experiment and it crashed at the final evaluation step. > > > > > > > > In the relative step directory the REPORTING_report.9.STDERR shows > > > > > > > > ERROR (extract_nist_bleu): could not find BLEU score in file > > > > > > > '/opt/tools/moses/scripts/ems/example/ex7ama/evaluation/testset.nist-bleu > > >-c > > > > > > > .9' ERROR (extract_nist_bleu): could not find BLEU score in file > > > > > > > '/opt/tools/moses/scripts/ems/example/ex7ama/evaluation/testset.nist-bleu > > >.9 > > > > > > > ' > > > > > > > > So I checked the previous step EVALUATION_testset_nist-bleu.9 > > > > the EVALUATION_testset_nist-bleu.9.STDERR shows > > > > > > > > Use of uninitialized value $tst_id in string eq at > > > > /opt/tools/moses/scripts/generic/mteval-v13a.pl line 488. > > > > Not the same 'setid' attribute values across files at > > > > /opt/tools/moses/scripts/generic/mteval-v13a.pl line 488. > > > > > > > > I launched the following command adding some debugging print and I > > > > found the problem is that $tst_id is empty. > > > > > > > > /opt/tools/moses/scripts/generic/mteval-v13a.pl -s > > > > /opt/tools/moses/scripts/ems/example/data-iw/testset/test-ama.en.sgm > -r > > > > /opt/tools/moses/scripts/ems/example/data-iw/testset/test-ama.it.sgm > -t > > > > > > > /opt/tools/moses/scripts/ems/example/ex7ama/evaluation/testset.detokenize > > >d. > > > > > > > sgm.9 > > > > > > > /opt/tools/moses/scripts/ems/example/ex7ama/evaluation/testset.nist-bleu. > > > > > > > >9 > > > > > > > > My test files have the following structure > > > > > > > > <srcset setid="testiw" srclang="any"> > > > > <doc docid="doc" genre="testset" origlang="en"> > > > > <seg id="1">Warning: The booking cannot...</seg> > > > > ... > > > > <seg id="266">If the...</seg> > > > > </doc> > > > > </srcset> > > > > > > > > <refset setid="testiw" trglang="it" srclang="any"> > > > > <doc sysid="ref" docid="doc" genre="testset" origlang="en"> > > > > <seg id="1">Avvertenza: la prenotazione non può...</seg> > > > > ... > > > > <seg id="266">Se la...</seg> > > > > </doc> > > > > </refset> > > > > > > > > Could someone tell me where the problems is? Could someone send me a > > > > well-formed sgm file that I can adopt my sgm test set to? > > > > > > > > Thank you in advance > > > > Mauro > > > > > > -- > > > Barry Haddow > > > University of Edinburgh > > > +44 (0) 131 651 3173 > > > > > > -- > > > The University of Edinburgh is a charitable body, registered in > > > Scotland, with registration number SC005336. > > > > -- > Barry Haddow > University of Edinburgh > +44 (0) 131 651 3173 > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > >
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
