Dear Vincent,
At the IWSLT Evaluation campaign we face the same situation. For this evaluation, we are using a toolkit developed by RWTH to align the output with the reference. Afterwards, any machine translation metric can be used to score the resegmented output with the reference. The tool is available under: https://www-i6.informatik.rwth-aachen.de/web/Software/mwerSegmenter.tar.gz The technique is described in : Matusov, E., Leusch, G., Bender, O., & Ney, H. (2005). Evaluating machine translation output with automatic sentence segmentation. IWSLT, 138–144. Best, Jan From: Mt-list <[email protected]> On Behalf Of Vincent Vandeghinste Sent: Friday, May 8, 2020 8:14 PM To: Mt List <[email protected]> Subject: [Mt-list] Sentence aligning speech translation with reference Dear MT'ers, Maybe some of you can answer the following question: I have a speech recognition based translation of a speech, with punctuation predictions etc. I have a sentence-based reference translation, one sentence per line. The sentence predictions of the speech translation system do not necessarily match the sentences of the reference file. How can I align my speech translation with the reference sentences so I can calculate BLEU scores and the like? Are there any scripts available for that? or papers? Thank you, kind regards, Vincent Vandeghinste
_______________________________________________ Mt-list site list [email protected] http://lists.eamt.org/mailman/listinfo/mt-list
