Hi Vincent, all, Not sure if this would be useful to you. I wrote an evaluation program to compare two different segmentations for MLP 2017 shared tasks. http://mlp.computing.dcu.ie/mlp2017_Shared_Task.html
It will be open sourced but for now here is the executable with examples. (Noted: changing the z00 extension to zip and decompress it) java -cp . chliu.segmenter17.Evaluator *language=*word-segmentation *oracle=*__examples_seg-evaluation__\tchinese17oracle.segd.txt *prediction=*__examples_seg-evaluation__\tchinese17predict.txt *output=*__examples_seg-evaluation__\tchinese17predict.evaluation.txt where 1. *language=* for your application it would be 'word-segmentation'; the program also evaluates morpheme segmentation 2. *oracle=* to specify the file with 'correct' segmentation 3. *prediction=* to specify a file with some segmentation to be evaluated 4. *output=* to specify file name of the evaluation In this example, evaluation file 'tchinese17predict.evaluation.txt' contains the information Oracle count: [95] Predict count: [69] Correct count: [64] Precision: [0.927536231884058] Recall: [0.6736842105263158] F1: [0.7804878048780488] It also outputs a file called 'tchinese17predict.evaluation.txt*.prf1s.txt*' with 6 columns of values. 8 5 2 0.4 0.25 0.3076923076923077 3 2 0 0.0 0.0 -1.0 These are numbers of "oracle separators", "predicted separators", "correctly predicted separators", "precision", "recall" and "f1". If the scores are not computable, they are assigned as -1.0. Another output file 'tchinese17predict.evaluation.txt*.unmatched.txt*' recorded those sentences where segmentation does not match the oracle (with line numbers). Line: 0 Annotation: [泰伯 第十二 章 這 一 章 書 的 講表] Prediction: [泰伯 第十 二章 這一章 書的 講表] Line: 1 Annotation: [濃郁 的 奶酪 風味] Prediction: [濃郁的 奶酪 風味 ] Best regards, Chao-Hong *Chao-Hong Liu* | Postdoctoral Researcher ADAPT Centre School of Computing m: +353 (0) 89 247 3035 Dublin City University e: [email protected] Dublin 9, Ireland www.adaptcentre.ie <https://twitter.com/adaptcentre> <https://www.facebook.com/ADAPTCentre?fref=ts> <https://www.youtube.com/channel/UC9--qVutTtyLyhZCJR7rY5g> <https://www.linkedin.com/company/adapt-centre> On Fri, Aug 25, 2017 at 9:53 AM, Vincent Vandeghinste < [email protected]> wrote: > Hi all, > > In the framework of MT from speech, I am working on automatic segmentation > of the recognized speech before sending it to the MT. In order to evaluate > the end quality of the MT (and the segmentation), I can no longer use the > standard MT metrics with references, as my MT output has a different > segmentation. I know there is a paper of Matusov (2005) at IWSLT that > describes how to solve this problem, and was wondering if there is anyone > who has a script that allows doing this, perhaps anyone involved in IWSLT? > It would be good for replicability if such an evaluation-script becomes > available. > > Thanks a million, > > kind regards, > > v. > > > > > _______________________________________________ > Mt-list site list > [email protected] > http://lists.eamt.org/mailman/listinfo/mt-list >
seg17evaluator_20170825e.z00
Description: Binary data
_______________________________________________ Mt-list site list [email protected] http://lists.eamt.org/mailman/listinfo/mt-list
