Hi Jorg, Sorry for the gap in the message thread - I was on vacation in Vienna (fun city!).
I'll send you the French/English data and my alignment editor in a separate message. > it remains a tricky business with the word alignment evaluation. what would > be the best way to compare results with previously reported experiments? > most people did use AER as you also mention in your paper. from your > discussion I conclude that for english-french an F-measure with alpha=0.4 > would be a good setting. (to be sure: you mean the harmonic mean and not the > geometric mean, right) but what would be the right thing to do to compare > results on standard sets? If you have to intrinsically evaluate, then precision and recall on the standard sets. It seems like a lot of the published work improves only precision, which doesn't seem to help MT (but might help cross lingual retrieval, for instance). But certainly improvements in both precision and recall are going in the right direction. However, I'd be more convinced by MT results, or results from another external application of interest, as Miles already argued. > By the way, are there any other studies on the influence of word alignment > quality for other purposes than standard SMT? I was again thinking of > approaches like Hiero, SAMT, maybe tree alignment and other types of > transfer rule extraction, annotation/grammar projection, bilingual > lexicon/terminology extraction etc. Our 2007 EMNLP paper (with Daniel Marcu) on the LEAF model shows improved MT performance on a Arabic-to-English HIERO system that was submitted to the NIST evaluation (as well as a French-to-English phrase based system). In addition to HIERO, ISI has also used LEAF for string-to-tree SAMT as well in their Arabic-to-English and Chinese-to-English NIST systems, where it shows consistent gains. Interestingly, for Chinese-to-English, the alignments don't help phrase-based systems very much, despite being useful for the string-to-tree system. That might be because the general paradigm of phrases-with-gaps needs better alignment quality than phrases, but it is hard to explain why this wouldn't be the case for Arabic-to-English. It might have to do with Chinese linguistic issues, see papers from Victoria Fossum and Kevin Knight if you are interested in discussion of this. They produce alignments that improve MT based on the observations. Cheers, Alex _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
