Re: [Moses-support] word alignment quality and symmetrization

Miles Osborne Fri, 06 Mar 2009 01:13:31 -0800

people use word alignments (actually phrases) for paraphrasing, so
there would be a connection between word alignment quality and
paraphrasing performance.  see here for an example Chris' PhD +
subsequent developments:


http://www.cs.jhu.edu/~ccb/

Miles

2009/3/6 J.Tiedemann <[email protected]>:
>
> Hi!
>
> thanks for all the replies. and thanks for the interesting paper on
> AER and the relation to BLEU scores. quite embarassing  that I haven't
> seen it before.
>
> Alexander: I would be interested in your gold standard data.  would be
> nice if you could make them available.
>
> it remains a tricky business with the word alignment evaluation. what
> would be the best way to compare results with previously reported
> experiments? most people did use AER as you also mention in your
> paper. from your discussion I conclude that for english-french an
> F-measure with alpha=0.4 would be a good setting. (to be sure: you
> mean the harmonic mean and not the geometric mean, right) but what
> would be the right thing to do to compare results on standard sets?
>
> By the way, are there any other studies on the influence of word
> alignment quality for other purposes than standard SMT? I was again
> thinking of approaches like Hiero, SAMT, maybe tree alignment and
> other types of transfer rule extraction, annotation/grammar
> projection,  bilingual lexicon/terminology extraction etc.
>
> I'm just curious.
> cheers,
>
> jorg
>
>>
>> Here is the longer answer to the question you didn't ask :-)
>>
>> 1) AER is broken for Sure and Possible links and can be gamed by
>> guessing fewer links. If you must use Sure vs. Possible alignments,
>> use Och and Ney's definition of Precision and Recall, and take 1 -
>>the
>> geometric mean. (See our CL squib, kindly already cited by Adam, for
>> more details).
>>
>> 2) The gold standard alignment set is broken (I assume we are
>>talking
>> about French/English btw, I think there was also German/English
>>which
>> I am not familiar with). There are 4376 Sure links and 19222
>>Possible
>> links. Franz told me that the way this was generated is that two
>> annotators both annotated the set. Intersection of the annotators
>>was
>> marked Sure, and union of the annotators was marked Possible. So the
>> interannotator agreement was really low. This was not done using a
>> GUI, btw, but instead by typing in offsets.
>>
>> 3) Sure vs. Possible_and_not_Sure is a nebulous distinction (see
>> above). If you would like the first 220 sentences of the set
>> reannotated as Sure only (in the spirit of Melamed's Blinker
>> guidelines), I can make those available. They worked better for
>> predicting MT performance.
>>
>> 4) The sentences annotated were sampled from the LDC Hansard, not
>>the
>> ISI Hansard; results using the ISI Hansard are not directly
>>comparable
>> (the gold standard alignments are also mismatched in time, I don't
>> know if this is important).
>>
>> 5) There are French/English alignments available for Europarl,
>>perhaps
>> you should be using these instead? They use Sure vs. Possible
>> unfortunately. I don't know if they had French or English native
>> spakers, so YMMV. Not to criticize though, I bet there are errors in
>> my annotation as well. Many thanks to those guys for releasing their
>> work!!
>>
>> https://www.l2f.inesc-id.pt/wiki/index.php/Word_Alignments
>>
>> 6) I would use unbalanced F-Measure rather than balanced F-Measure
>> (see again the squib, this is the main point of it). For
>>applications
>> where precision is more important (such as cross-lingual retrieval),
>> increase alpha to weight precision more.
>>
>> Cheers, Alex
>> ---
>> Alexander Fraser
>> Institute for Natural Language Processing
>> University of Stuttgart
>> Azenbergstrasse 12
>> 70174 Stuttgart, Germany
>>
>> phone: +49 (711) 685-81375
>> fax:   +49 (711) 685-71400
>> email: [email protected]
>> web:   http://www.ims.uni-stuttgart.de/~fraser
>> _______________________________________________
>> Moses-support mailing list
>> [email protected]
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] word alignment quality and symmetrization

Reply via email to