[Moses-support] I sign the train-model.perl petition... and more.

Tom Hoar Sat, 14 Feb 2015 08:04:26 -0800

Ken,

I'd sign your petition, but we're going further. We're now working on a 
significant upgrade to train-model.perl (and soon mert-moses.pl) to run 
on native Win64 (dev/testing on Strawberry Perl 64).

It's far enough along that train-model.perl properly handles drive 
letters, OS path separators (slash vs back-slash), auto-appends ".exe" 
extensions to binaries, etc. We've replaced most system calls like `rm`, 
`wc`, `time`, `cat`, etc with native Perl code and continue the work. It 
relies on the Moses binaries in the current relative paths & (M)GIZA++ 
binaries in the -external-bin-dir path. The only non-Moses external 
binaries will be sort (gsort), split (gsplit), gzip, and bzcat (unless 
others pop up). We're testing with 32-bit Gow binaries (anyone have the 
64-bit binaries?). Like Posix, these will have to be in the Win64 system 
path. I'm not sure what we'll do to manage symlinks. Suggestions welcome.

To complement the Perl work, Jeroen is updating phrase-based Moses code 
(maybe phrase-factored, too) to run on Native Win64 (including lmplz and 
query). In the end, the entire train/tune/translate tools chain will run 
on native Win64 without Cygwin.

Back to your petition idea. In this upgrade, our original plan included 
adding return-code checking and pass-through (I was researching that as 
your message came in). We're adding formatted log 'print' statements 
immediately after the close() statements (or equivalents) to report the 
full paths of each step's final output files. Between return codes and 
screen-scrapers, any wrapper should be able detect success/fail of any 
step. We've also done a general clean up (e.g. normalizing indentations 
with 4 spaces). The goal: maintain the existing Posix (Linux/Mac) 
use-case, add the native Win64 use-case AND improve reliability when run 
from wrappers -- all without changing the current business rules/ SMT 
functions.

We've changed the Perl script names: `train-model.perl` to 
`train-model-x.perl` and `mert-moses.pl` to `mert-moses-x.pl` ("x" for 
"cross-platform"). We'll add them to the trunk when they're ready (early 
March?). Hopefully, enough people will test and validate them to be 
reliable and robust. Maybe they can replace the current scripts? As of 
today, steps 3, 4 and 9 are fully tested on Strawberry Win64 with Gow on 
WinXP64 and Wine64.

Comments, requests, volunteers from the general Moses community are welcome.

On 02/14/2015 09:54 PM, Kenneth Heafield wrote:
> Sign my petition to add return code checking to train-model.perl.
>
> On 02/14/2015 09:33 AM, Tom Hoar wrote:
>> An empty phrase-table.gz file is usually the result of an ill-prepared
>> training corpus. Make sure you run the final corpus through
>> clean-corpus-n.perl.
>>
>>
>>
>> On 02/14/2015 09:19 PM, Александр Паньшин wrote:
>>> Hello, everybody!
>>>
>>> I have a problem with moses. I created big parallel corpus by
>>> concatenating a bunch of existing corpuses on
>>> http://opus.lingfil.uu.se. After that I cleaned up results (while
>>> creating tokens script reported some errors. I deleted error-prone
>>> rows from both of parts).
>>>
>>> Then I started to train translation model using mgiza with such an
>>> executable:
>>>
>>> nohup nice /opt/moses/scripts/training/train-model.perl --parallel
>>> -mgiza -mgiza-cpus 20 -cores 20 -root-dir train -corpus
>>> ~/corpus/ru-en.clean -f ru -e en -alignment grow-diag-final-and
>>> -reordering msd-bidirectional-fe -lm 0:3:$HOME/lm/ru-en.arpa.en:8
>>> -external-bin-dir /opt/moses/mgiza >& training.out &
>>>
>>> After a week of work I have this in the end of training.out:
>>> (7) learn reordering model @ Sun Feb  8 15:30:35 MSK 2015
>>> (7.1) [no factors] learn reordering model @ Sun Feb  8 15:30:35 MSK 2015
>>> (7.2) building tables @ Sun Feb  8 15:30:35 MSK 2015
>>> Executing: /opt/moses/scripts/../bin/lexical-reordering-score
>>> /home/adminadmin/working/train/model/extract.o.sorted.gz 0.5
>>> /home/adminadmin/working/train/model/reordering-table. --model "wbe
>>> msd wbe-msd-bidirectional-fe"
>>> Lexical Reordering Scorer
>>> scores lexical reordering models of several types (hierarchical,
>>> phrase-based and word-based-extraction
>>> (8) learn generation model @ Sun Feb  8 15:30:35 MSK 2015
>>>    no generation model requested, skipping step
>>> (9) create moses.ini @ Sun Feb  8 15:30:35 MSK 2015
>>>
>>> There is a bunch of files in ~/working/train folder. Looks like
>>> everything is ok, except the tiny problem: phrase-table.tgz has size
>>> of 20 bytes. And, of course, it's not usable at all!
>>>
>>> Can somebody help and give me a direction where to dig?
>>>
>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> [email protected]
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> [email protected]
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

[Moses-support] I sign the train-model.perl petition... and more.

Reply via email to