Hi Matthias,
thank you for your detailed instructions. I will try that out.
Yours,
Per Tunedal

On Fri, Feb 14, 2014, at 17:03, Matthias Huck wrote:
> Hi Per,
> 
> The standard workflow is to run a postprocessing step on the output,
> e.g. with scripts/tokenizer/detokenizer.perl in Moses.
> 
> Usage ./detokenizer.perl (-l [en|fr|it|cs|...]) < tokenizedfile >
> detokenizedfile
> Options:
>   -u     ... uppercase the first char in the final sentence.
>   -q     ... don't report detokenizer revision.
>   -b     ... disable Perl buffering.
>   -penn  ... assume input is tokenized as per tokenizer.perl's -penn
>   option.
> 
> 
> If you are using EMS, you might want to integrate this into your
> pipeline in the following way:
> 
> [EVALUATION]
> detokenizer = "$moses-script-dir/tokenizer/detokenizer.perl -l
> $output-extension"
> 
> Cheers,
> Matthias
> 
> 
> On Fri, 2014-02-14 at 13:14 +0100, Per Tunedal wrote:
> > Hi,
> > following the baseline instructions I've tokenized and recased the text
> > before training. And consequently I get similar output when translating.
> > 
> > Are there any scripts available to get back a normal text from the
> > output? Especially the html-encoding for some characters e.g. the french
> > é, è and ê makes reading uncomfortable. A production system would have
> > to produce readable output anyway.
> > 
> > What's the standard work flow?
> > 
> > Yours,
> > Per Tunedal
> > 
> > _______________________________________________
> > Moses-support mailing list
> > [email protected]
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> 
> 
> 
> -- 
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
> 
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to