Hi,

The aggressive tokenization is turned on with the flag -a and it splits up
hyphenated words, which is useful when dealing with over-the-top
hyphenation by hyphen-fanatics, since it reduces out-of-vocabulary words.

-phi

On 23 Jan 2012 14:42, "Eleftherios Avramidis" <[email protected]>
wrote:

> Hi all,
>
> I noticed that tokenizer.perl now does "Aggressive" hyphen splitting, an
> option which enabled by default for the users of EMS. This produces
> hyphens that look like @-@ which of course later appear in the word
> alignments and the decoding results.
>  Does somebody know why would this option be useful?
>
> cheers
> Lefteris
>
> --
> MSc. Inf. Eleftherios Avramidis
> DFKI GmbH, Alt-Moabit 91c, 10559 Berlin
> Tel. +49-30 238 95-1806
>
> Fax. +49-30 238 95-1810
>
>
> -------------------------------------------------------------------------------------------
> Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
> Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern
>
> Geschaeftsfuehrung:
> Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
> Dr. Walter Olthoff
>
> Vorsitzender des Aufsichtsrats:
> Prof. Dr. h.c. Hans A. Aukes
>
> Amtsgericht Kaiserslautern, HRB 2313
>
> -------------------------------------------------------------------------------------------
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to