Hi Matthias, thank you for your detailed instructions. I will try that out. Yours, Per Tunedal
On Fri, Feb 14, 2014, at 17:03, Matthias Huck wrote: > Hi Per, > > The standard workflow is to run a postprocessing step on the output, > e.g. with scripts/tokenizer/detokenizer.perl in Moses. > > Usage ./detokenizer.perl (-l [en|fr|it|cs|...]) < tokenizedfile > > detokenizedfile > Options: > -u ... uppercase the first char in the final sentence. > -q ... don't report detokenizer revision. > -b ... disable Perl buffering. > -penn ... assume input is tokenized as per tokenizer.perl's -penn > option. > > > If you are using EMS, you might want to integrate this into your > pipeline in the following way: > > [EVALUATION] > detokenizer = "$moses-script-dir/tokenizer/detokenizer.perl -l > $output-extension" > > Cheers, > Matthias > > > On Fri, 2014-02-14 at 13:14 +0100, Per Tunedal wrote: > > Hi, > > following the baseline instructions I've tokenized and recased the text > > before training. And consequently I get similar output when translating. > > > > Are there any scripts available to get back a normal text from the > > output? Especially the html-encoding for some characters e.g. the french > > é, è and ê makes reading uncomfortable. A production system would have > > to produce readable output anyway. > > > > What's the standard work flow? > > > > Yours, > > Per Tunedal > > > > _______________________________________________ > > Moses-support mailing list > > [email protected] > > http://mailman.mit.edu/mailman/listinfo/moses-support > > > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
