Ok, clearly. My question was mostly because I saw all these @-@ in the
(un-de-)tokenized system output. I just realized that the de-tokenizer
does join back the split hyphens with the adjacent words in the
translation output.
Two points (just tell me if I got it right):
- this affects automatric metric scoring, if ones runs an automatic
metric between the tokenized output and the tokenized reference, both
containing "forced" @-@ and "simple" - (which is the case for the
multibleu tool as in EMS). Multibleu would measure a trigram instead of
a single word
- we do the same as compound splitting, with the difference that we do
it for every case that there is a splittable term, since a hyphen is a
clear indication
cheers,
Lefteris
On 23/01/12 17:12, Philipp Koehn wrote:
Hi,
The aggressive tokenization is turned on with the flag -a and it
splits up hyphenated words, which is useful when dealing with
over-the-top hyphenation by hyphen-fanatics, since it reduces
out-of-vocabulary words.
-phi
On 23 Jan 2012 14:42, "Eleftherios Avramidis"
<[email protected] <mailto:[email protected]>>
wrote:
Hi all,
I noticed that tokenizer.perl now does "Aggressive" hyphen
splitting, an
option which enabled by default for the users of EMS. This produces
hyphens that look like @-@ which of course later appear in the word
alignments and the decoding results.
Does somebody know why would this option be useful?
cheers
Lefteris
--
MSc. Inf. Eleftherios Avramidis
DFKI GmbH, Alt-Moabit 91c, 10559 Berlin
Tel. +49-30 238 95-1806 <tel:%2B49-30%20238%2095-1806>
Fax. +49-30 238 95-1810 <tel:%2B49-30%20238%2095-1810>
-------------------------------------------------------------------------------------------
Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern
Geschaeftsfuehrung:
Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
Dr. Walter Olthoff
Vorsitzender des Aufsichtsrats:
Prof. Dr. h.c. Hans A. Aukes
Amtsgericht Kaiserslautern, HRB 2313
-------------------------------------------------------------------------------------------
_______________________________________________
Moses-support mailing list
[email protected] <mailto:[email protected]>
http://mailman.mit.edu/mailman/listinfo/moses-support
--
MSc. Inf. Eleftherios Avramidis
DFKI GmbH, Alt-Moabit 91c, 10559 Berlin
Tel. +49-30 238 95-1806
Fax. +49-30 238 95-1810
-------------------------------------------------------------------------------------------
Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern
Geschaeftsfuehrung:
Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
Dr. Walter Olthoff
Vorsitzender des Aufsichtsrats:
Prof. Dr. h.c. Hans A. Aukes
Amtsgericht Kaiserslautern, HRB 2313
-------------------------------------------------------------------------------------------
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support