Hi Vincent, all,

Not sure if this would be useful to you.  I wrote an evaluation program to
compare two different segmentations for MLP 2017 shared tasks.
http://mlp.computing.dcu.ie/mlp2017_Shared_Task.html

It will be open sourced but for now here is the executable with examples.
(Noted: changing the z00 extension to zip and decompress it)

java -cp . chliu.segmenter17.Evaluator *language=*word-segmentation
*oracle=*__examples_seg-evaluation__\tchinese17oracle.segd.txt
*prediction=*__examples_seg-evaluation__\tchinese17predict.txt
*output=*__examples_seg-evaluation__\tchinese17predict.evaluation.txt

where
1. *language=* for your application it would be 'word-segmentation'; the
program also evaluates morpheme segmentation
2. *oracle=*  to specify the file with 'correct' segmentation
3. *prediction=* to specify a file with some segmentation to be evaluated
4. *output=* to specify file name of the evaluation

In this example, evaluation file 'tchinese17predict.evaluation.txt'
contains the information
Oracle count:  [95]
Predict count: [69]
Correct count: [64]
  Precision: [0.927536231884058]
  Recall:    [0.6736842105263158]
  F1:        [0.7804878048780488]

It also outputs a file called 'tchinese17predict.evaluation.txt*.prf1s.txt*'
with 6 columns of values.
8 5 2 0.4 0.25 0.3076923076923077
3 2 0 0.0 0.0 -1.0

These are numbers of "oracle separators", "predicted separators",
"correctly predicted separators", "precision", "recall" and "f1".  If the
scores are not computable, they are assigned as -1.0.

Another output file 'tchinese17predict.evaluation.txt*.unmatched.txt*'
recorded those sentences where segmentation does not match the oracle (with
line numbers).

Line: 0
  Annotation: [泰伯 第十二 章 這 一 章 書 的 講表]
  Prediction: [泰伯 第十 二章 這一章 書的 講表]

Line: 1
  Annotation: [濃郁 的 奶酪 風味]
  Prediction: [濃郁的 奶酪 風味 ]

Best regards,

Chao-Hong

*Chao-Hong Liu* | Postdoctoral Researcher
ADAPT Centre
School of Computing m: +353 (0) 89 247 3035
Dublin City University e: [email protected]
Dublin 9, Ireland www.adaptcentre.ie
<https://twitter.com/adaptcentre>
<https://www.facebook.com/ADAPTCentre?fref=ts>
<https://www.youtube.com/channel/UC9--qVutTtyLyhZCJR7rY5g>
<https://www.linkedin.com/company/adapt-centre>

On Fri, Aug 25, 2017 at 9:53 AM, Vincent Vandeghinste <
[email protected]> wrote:

> Hi all,
>
> In the framework of MT from speech, I am working on automatic segmentation
> of the recognized speech before sending it to the MT. In order to evaluate
> the end quality of the MT (and the segmentation), I can no longer use the
> standard MT metrics with references, as my MT output has a different
> segmentation. I know there is a paper of Matusov (2005) at IWSLT that
> describes how to solve this problem, and was wondering if there is anyone
> who has a script that allows doing this, perhaps anyone involved in IWSLT?
> It would be good for replicability if such an evaluation-script becomes
> available.
>
> Thanks a million,
>
> kind regards,
>
> v.
>
>
>
>
> _______________________________________________
> Mt-list site list
> [email protected]
> http://lists.eamt.org/mailman/listinfo/mt-list
>

Attachment: seg17evaluator_20170825e.z00
Description: Binary data

_______________________________________________
Mt-list site list
[email protected]
http://lists.eamt.org/mailman/listinfo/mt-list

Reply via email to