Hi Jorg,

Sorry for the gap in the message thread - I was on vacation in Vienna
(fun city!).

I'll send you the French/English data and my alignment editor in a
separate message.

> it remains a tricky business with the word alignment evaluation. what would
> be the best way to compare results with previously reported experiments?
> most people did use AER as you also mention in your paper. from your
> discussion I conclude that for english-french an F-measure with alpha=0.4
> would be a good setting. (to be sure: you mean the harmonic mean and not the
> geometric mean, right) but what would be the right thing to do to compare
> results on standard sets?

If you have to intrinsically evaluate, then precision and recall on
the standard sets. It seems like a lot of the published work improves
only precision, which doesn't seem to help MT (but might help cross
lingual retrieval, for instance). But certainly improvements in both
precision and recall are going in the right direction.

However, I'd be more convinced by MT results, or results from another
external application of interest, as Miles already argued.

> By the way, are there any other studies on the influence of word alignment
> quality for other purposes than standard SMT? I was again thinking of
> approaches like Hiero, SAMT, maybe tree alignment and other types of
> transfer rule extraction, annotation/grammar projection,  bilingual
> lexicon/terminology extraction etc.

Our 2007 EMNLP paper (with Daniel Marcu) on the LEAF model shows
improved MT performance on a Arabic-to-English HIERO system that was
submitted to the NIST evaluation (as well as a French-to-English
phrase based system).

In addition to HIERO, ISI has also used LEAF for string-to-tree SAMT
as well in their Arabic-to-English and Chinese-to-English NIST
systems, where it shows consistent gains.

Interestingly, for Chinese-to-English, the alignments don't help
phrase-based systems very much, despite being useful for the
string-to-tree system. That might be because the general paradigm of
phrases-with-gaps needs better alignment quality than phrases, but it
is hard to explain why this wouldn't be the case for
Arabic-to-English. It might have to do with Chinese linguistic issues,
see papers from Victoria Fossum and Kevin Knight if you are interested
in discussion of this. They produce alignments that improve MT based
on the observations.

Cheers, Alex

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to