Hi all,

Is there any body can help on this?

There might be some mistakes during the training of the Moses engine 2
(used the source language model), but the criterion to select the
sample data for BLEU/NIST score evaluation is still sth I want to
know/make sure.

Thanks so,
Wenlong


2010/8/7, Wenlong Yang <[email protected]>:
> Hi all,
>
> can any of you help to provide some materials about how to select the
> sample
> data for BLEU/NIST evaluation?
> I mean, how many lines of data shoud I choose for the evaluation? and how
> can I choose the data to let them can be more representable for our
> domain/use?
>
>
> I have tried to generate BLEU score by using 1000 lines' sample data and
> 12000 lines' data, which of both are in our domain, but the second times'
> evaluation has higher scores, does this make sense?
> I actually trained two Moses engines, for the first evaluation (1000 line),
> Moses Engine1's score is lower than Moses Engine 2; but for the second time
> (12000 line), Moses Engine1's score is higher than Moses Engine 2.
> Which result should I trust? This phenominon makes me trust the scores
> less.
>
> Does anybody has any similar experiences? Is there any problem in my
> evaluation data?
> How can I generate more accurate scores?
>
> Thanks so much,
> Wenlong
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to