Thanks Mark, My results are still off. My data is encoded in utf-8. Your script reports an actual BLEU score of 0.024447 for my hypothesis 1. The score reported by mteval using the -e option is 0.2459. The score reported by mteval without the -e option is 0.0268. I'm not sure which score is more accurate since I can't read the language, but the 2 scores are off by an order of magnitude. Is the -e option to mteval bogus?
John On 11/29/10, Mark Fishel <[email protected]> wrote: > Hi John, > > Thanks for pointing out the issue; I added support for arbitrary > encodings to the script, by default it's set to UTF8 but you can > change the global variable on line 23 for other encodings; just update > the file from SVN. > > Treating non-ascii characters as separate tokens by wrapping them in > spaces should not be the right thing to do in the general case, as far > as I understand. > > Best, > Mark > > On Mon, Nov 29, 2010 at 12:34 AM, John Morgan > <[email protected]> wrote: >> Hi, >> I'd like to use the script >> bootstrap-hypothesis-difference-significance.pl >> to compare 2 systems that translate from English into languages that >> use non-ascii character encodings. >> I think this script is written for English hypothesis and reference files. >> I guess that an option similar to the -e option to mteval needs to be >> added to the script to make it work for non-ascii files. >> I added the following line to the script at line 240 after the "while" >> statement slurps in a line from the opened file: >> s/([^[:ascii:]])/ $1 /g >> It looks like this is all the -e option to mteval does. >> I have 2 questions: >> Is this the correct way to get the bootstrap script to work on >> non-ascii text files? >> If yes, can anyone explain to me why? >> Why do we need to wrap white space around nonascii characters? >> >> When I do this the BLEU scores look reasonable (but I could be fooling >> myself). >> >> >> -- >> Regards, >> John J Morgan >> _______________________________________________ >> Moses-support mailing list >> [email protected] >> http://mailman.mit.edu/mailman/listinfo/moses-support >> > -- Regards, John J Morgan _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
