Thanks, Vincent.

An earlier Moses Perl script used on SGM file as a template, but it was limited. I never found a good tool to create SGM files for mteval from scratch. It's just as hard to find a documented reference for the mteval scripts. That's why I created this tool.

Your two option suggestions are interesting, but I'm not sure it's practical. A head or tail of -nb lines would be straight forward as long as you keep all three data sets (src, ref, tst) in sync. Doing that for a random selection is more involved. I'll have some time late next week to look at these options.

Tom


On 9/14/2015 4:30 PM, [email protected] wrote:
Date: Mon, 14 Sep 2015 08:55:28 +0200
From: Vincent Nguyen<[email protected]>
Subject: Re: [Moses-support] sgm generation for personalized test sets
To:[email protected]
Message-ID:<[email protected]>
Content-Type: text/plain; charset=windows-1252; format=flowed

Hi Tom,

If this script is intended exactly and only to generate sgm test/dev
files from txt file then yes it needs to be amended.

1) line breakers except 0A need to be removed prior to the python
execution (byte stream replace)

2) even though XML standard is to replace ' by &apos; and so on for
others I have noticed that all test/dev sets do not include the xml
codes like &apos;
so waht I did I removed the second string replace in your code.
however I added 2 others replaces in the first sequence : &nbsp; => " "
and &#160; => " "

3) even though this is standard for XML I removed the first 3 lines for
the doc
XML DOCTYPE and MTEVAL
also the last one MTEVAL

all of this to stick to the expected file for test sets.

If you have the chance, you could add 2 options :
- nb = nb of lines you want to take from the file
- selection = either nb first lines or random in the txt file

I am just wondering if there is not another perl script developped by
someone. how were the sets generated to start with ?

cheers,
Vincent

--
Best regards,

Tom Hoar
Chief Executive Officer
/*Precision Translation Tools Pte Ltd*/
Singapore/Thailand
Web: www.precisiontranslationtools.com <http://www.precisiontranslationtools.com>
Thailand Mobile: +66 87 345-1875
Skype: tahoar
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to