Hi, The aggressive tokenization is turned on with the flag -a and it splits up hyphenated words, which is useful when dealing with over-the-top hyphenation by hyphen-fanatics, since it reduces out-of-vocabulary words.
-phi On 23 Jan 2012 14:42, "Eleftherios Avramidis" <[email protected]> wrote: > Hi all, > > I noticed that tokenizer.perl now does "Aggressive" hyphen splitting, an > option which enabled by default for the users of EMS. This produces > hyphens that look like @-@ which of course later appear in the word > alignments and the decoding results. > Does somebody know why would this option be useful? > > cheers > Lefteris > > -- > MSc. Inf. Eleftherios Avramidis > DFKI GmbH, Alt-Moabit 91c, 10559 Berlin > Tel. +49-30 238 95-1806 > > Fax. +49-30 238 95-1810 > > > ------------------------------------------------------------------------------------------- > Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH > Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern > > Geschaeftsfuehrung: > Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender) > Dr. Walter Olthoff > > Vorsitzender des Aufsichtsrats: > Prof. Dr. h.c. Hans A. Aukes > > Amtsgericht Kaiserslautern, HRB 2313 > > ------------------------------------------------------------------------------------------- > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support >
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
