Ok, clearly. My question was mostly because I saw all these @-@ in the (un-de-)tokenized system output. I just realized that the de-tokenizer does join back the split hyphens with the adjacent words in the translation output.

Two points (just tell me if I got it right):
- this affects automatric metric scoring, if ones runs an automatic metric between the tokenized output and the tokenized reference, both containing "forced" @-@ and "simple" - (which is the case for the multibleu tool as in EMS). Multibleu would measure a trigram instead of a single word - we do the same as compound splitting, with the difference that we do it for every case that there is a splittable term, since a hyphen is a clear indication

cheers,
Lefteris



On 23/01/12 17:12, Philipp Koehn wrote:

Hi,

The aggressive tokenization is turned on with the flag -a and it splits up hyphenated words, which is useful when dealing with over-the-top hyphenation by hyphen-fanatics, since it reduces out-of-vocabulary words.

-phi

On 23 Jan 2012 14:42, "Eleftherios Avramidis" <[email protected] <mailto:[email protected]>> wrote:

    Hi all,

    I noticed that tokenizer.perl now does "Aggressive" hyphen
    splitting, an
    option which enabled by default for the users of EMS. This produces
    hyphens that look like @-@ which of course later appear in the word
    alignments and the decoding results.
     Does somebody know why would this option be useful?

    cheers
    Lefteris

    --
    MSc. Inf. Eleftherios Avramidis
    DFKI GmbH, Alt-Moabit 91c, 10559 Berlin
    Tel. +49-30 238 95-1806 <tel:%2B49-30%20238%2095-1806>

    Fax. +49-30 238 95-1810 <tel:%2B49-30%20238%2095-1810>

    
-------------------------------------------------------------------------------------------
    Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
    Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern

    Geschaeftsfuehrung:
    Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
    Dr. Walter Olthoff

    Vorsitzender des Aufsichtsrats:
    Prof. Dr. h.c. Hans A. Aukes

    Amtsgericht Kaiserslautern, HRB 2313
    
-------------------------------------------------------------------------------------------

    _______________________________________________
    Moses-support mailing list
    [email protected] <mailto:[email protected]>
    http://mailman.mit.edu/mailman/listinfo/moses-support



--
MSc. Inf. Eleftherios Avramidis
DFKI GmbH, Alt-Moabit 91c, 10559 Berlin
Tel. +49-30 238 95-1806

Fax. +49-30 238 95-1810

-------------------------------------------------------------------------------------------
Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern

Geschaeftsfuehrung:
Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
Dr. Walter Olthoff

Vorsitzender des Aufsichtsrats:
Prof. Dr. h.c. Hans A. Aukes

Amtsgericht Kaiserslautern, HRB 2313
-------------------------------------------------------------------------------------------

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to