Hi Barry,

I solved, the source file to be correctly wrapped must be in utf-8 whitout
bom format and with no indent spaces.

Mauro

On Fri, Apr 13, 2012 at 10:38 AM, Barry Haddow
<[email protected]>wrote:

> Hi Mauro
>
> It's the EVALUATION_testset_wrap stage that generates this file, and it
> runs
> wrap-xml.perl (found  in EMS). I'm not really sure why the wrapping script
> is
> failing to produce correct output. I guess there must be an error in your
> source sgm file,
>
> cheers - Barry
>
> On Friday 13 April 2012 09:17:35 Mauro Zanotti wrote:
> > Hi Barry,
> >
> > I found the problem: the testset file generated
> (testset.detokenized.sgm.9)
> > has a wrong structure (the start tag is srcset instead of tstset,
> properly
> > inserted at the end of the file) and 2 missing tag (trglang and sysid), .
> > What's the process that creates this file?
> >
> > Thank you
> > Mauro
> >
> >
> > <srcset setid="testiw" srclang="any">
> >   <doc docid="doc" genre="testset" origlang="en">
> >     <seg id="1">Attenzione: ...
> > ...
> >     <seg id="266">Se la compagnia ...</seg>
> >   </doc>
> > </tstset>
> >
> > On Thu, Apr 12, 2012 at 5:11 PM, Barry Haddow
> <[email protected]>wrote:
> > > Hi Mauro
> > >
> > > > Could someone tell me where the problems is? Could someone send me a
> > > > well-formed sgm file that I can adopt my sgm test set to?
> > >
> > > The files released for wmt12 will work with the nist script. See
> > > http://www.statmt.org/wmt12/translation-task.html
> > > and look for the development sets, as they come with references.
> > > These should show which tags are required.
> > >
> > > cheers - Barry
> > >
> > > On Thursday 12 April 2012 11:39:31 Mauro Zanotti wrote:
> > > > Dear all,
> > > >
> > > > I ran an experiment and it crashed at the final evaluation step.
> > > >
> > > > In the relative step directory the REPORTING_report.9.STDERR shows
> > > >
> > > > ERROR (extract_nist_bleu): could not find BLEU score in file
> > >
> > >
> '/opt/tools/moses/scripts/ems/example/ex7ama/evaluation/testset.nist-bleu
> > >-c
> > >
> > > > .9' ERROR (extract_nist_bleu): could not find BLEU score in file
> > >
> > >
> '/opt/tools/moses/scripts/ems/example/ex7ama/evaluation/testset.nist-bleu
> > >.9
> > >
> > > > '
> > > >
> > > > So I checked the previous step EVALUATION_testset_nist-bleu.9
> > > > the EVALUATION_testset_nist-bleu.9.STDERR shows
> > > >
> > > > Use of uninitialized value $tst_id in string eq at
> > > > /opt/tools/moses/scripts/generic/mteval-v13a.pl line 488.
> > > > Not the same 'setid' attribute values across files at
> > > > /opt/tools/moses/scripts/generic/mteval-v13a.pl line 488.
> > > >
> > > > I launched the following command adding some debugging print and I
> > > > found the problem is that $tst_id is empty.
> > > >
> > > > /opt/tools/moses/scripts/generic/mteval-v13a.pl -s
> > > > /opt/tools/moses/scripts/ems/example/data-iw/testset/test-ama.en.sgm
> -r
> > > > /opt/tools/moses/scripts/ems/example/data-iw/testset/test-ama.it.sgm
> -t
> > >
> > >
> /opt/tools/moses/scripts/ems/example/ex7ama/evaluation/testset.detokenize
> > >d.
> > >
> > > > sgm.9
> > >
> > >
> /opt/tools/moses/scripts/ems/example/ex7ama/evaluation/testset.nist-bleu.
> > >
> > > > >9
> > > >
> > > > My test files have the following structure
> > > >
> > > > <srcset setid="testiw" srclang="any">
> > > >   <doc docid="doc" genre="testset" origlang="en">
> > > >     <seg id="1">Warning: The booking cannot...</seg>
> > > > ...
> > > > <seg id="266">If the...</seg>
> > > >   </doc>
> > > > </srcset>
> > > >
> > > > <refset setid="testiw" trglang="it"  srclang="any">
> > > >   <doc sysid="ref" docid="doc" genre="testset" origlang="en">
> > > >     <seg id="1">Avvertenza: la prenotazione non può...</seg>
> > > > ...
> > > > <seg id="266">Se la...</seg>
> > > >   </doc>
> > > > </refset>
> > > >
> > > > Could someone tell me where the problems is? Could someone send me a
> > > > well-formed sgm file that I can adopt my sgm test set to?
> > > >
> > > > Thank you in advance
> > > > Mauro
> > >
> > > --
> > > Barry Haddow
> > > University of Edinburgh
> > > +44 (0) 131 651 3173
> > >
> > > --
> > > The University of Edinburgh is a charitable body, registered in
> > > Scotland, with registration number SC005336.
> >
>
> --
> Barry Haddow
> University of Edinburgh
> +44 (0) 131 651 3173
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to