Ken,
I'd sign your petition, but we're going further. We're now working on a
significant upgrade to train-model.perl (and soon mert-moses.pl) to run
on native Win64 (dev/testing on Strawberry Perl 64).
It's far enough along that train-model.perl properly handles drive
letters, OS path separators (slash vs back-slash), auto-appends ".exe"
extensions to binaries, etc. We've replaced most system calls like `rm`,
`wc`, `time`, `cat`, etc with native Perl code and continue the work. It
relies on the Moses binaries in the current relative paths & (M)GIZA++
binaries in the -external-bin-dir path. The only non-Moses external
binaries will be sort (gsort), split (gsplit), gzip, and bzcat (unless
others pop up). We're testing with 32-bit Gow binaries (anyone have the
64-bit binaries?). Like Posix, these will have to be in the Win64 system
path. I'm not sure what we'll do to manage symlinks. Suggestions welcome.
To complement the Perl work, Jeroen is updating phrase-based Moses code
(maybe phrase-factored, too) to run on Native Win64 (including lmplz and
query). In the end, the entire train/tune/translate tools chain will run
on native Win64 without Cygwin.
Back to your petition idea. In this upgrade, our original plan included
adding return-code checking and pass-through (I was researching that as
your message came in). We're adding formatted log 'print' statements
immediately after the close() statements (or equivalents) to report the
full paths of each step's final output files. Between return codes and
screen-scrapers, any wrapper should be able detect success/fail of any
step. We've also done a general clean up (e.g. normalizing indentations
with 4 spaces). The goal: maintain the existing Posix (Linux/Mac)
use-case, add the native Win64 use-case AND improve reliability when run
from wrappers -- all without changing the current business rules/ SMT
functions.
We've changed the Perl script names: `train-model.perl` to
`train-model-x.perl` and `mert-moses.pl` to `mert-moses-x.pl` ("x" for
"cross-platform"). We'll add them to the trunk when they're ready (early
March?). Hopefully, enough people will test and validate them to be
reliable and robust. Maybe they can replace the current scripts? As of
today, steps 3, 4 and 9 are fully tested on Strawberry Win64 with Gow on
WinXP64 and Wine64.
Comments, requests, volunteers from the general Moses community are welcome.
On 02/14/2015 09:54 PM, Kenneth Heafield wrote:
> Sign my petition to add return code checking to train-model.perl.
>
> On 02/14/2015 09:33 AM, Tom Hoar wrote:
>> An empty phrase-table.gz file is usually the result of an ill-prepared
>> training corpus. Make sure you run the final corpus through
>> clean-corpus-n.perl.
>>
>>
>>
>> On 02/14/2015 09:19 PM, Александр Паньшин wrote:
>>> Hello, everybody!
>>>
>>> I have a problem with moses. I created big parallel corpus by
>>> concatenating a bunch of existing corpuses on
>>> http://opus.lingfil.uu.se. After that I cleaned up results (while
>>> creating tokens script reported some errors. I deleted error-prone
>>> rows from both of parts).
>>>
>>> Then I started to train translation model using mgiza with such an
>>> executable:
>>>
>>> nohup nice /opt/moses/scripts/training/train-model.perl --parallel
>>> -mgiza -mgiza-cpus 20 -cores 20 -root-dir train -corpus
>>> ~/corpus/ru-en.clean -f ru -e en -alignment grow-diag-final-and
>>> -reordering msd-bidirectional-fe -lm 0:3:$HOME/lm/ru-en.arpa.en:8
>>> -external-bin-dir /opt/moses/mgiza >& training.out &
>>>
>>> After a week of work I have this in the end of training.out:
>>> (7) learn reordering model @ Sun Feb 8 15:30:35 MSK 2015
>>> (7.1) [no factors] learn reordering model @ Sun Feb 8 15:30:35 MSK 2015
>>> (7.2) building tables @ Sun Feb 8 15:30:35 MSK 2015
>>> Executing: /opt/moses/scripts/../bin/lexical-reordering-score
>>> /home/adminadmin/working/train/model/extract.o.sorted.gz 0.5
>>> /home/adminadmin/working/train/model/reordering-table. --model "wbe
>>> msd wbe-msd-bidirectional-fe"
>>> Lexical Reordering Scorer
>>> scores lexical reordering models of several types (hierarchical,
>>> phrase-based and word-based-extraction
>>> (8) learn generation model @ Sun Feb 8 15:30:35 MSK 2015
>>> no generation model requested, skipping step
>>> (9) create moses.ini @ Sun Feb 8 15:30:35 MSK 2015
>>>
>>> There is a bunch of files in ~/working/train folder. Looks like
>>> everything is ok, except the tiny problem: phrase-table.tgz has size
>>> of 20 bytes. And, of course, it's not usable at all!
>>>
>>> Can somebody help and give me a direction where to dig?
>>>
>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> [email protected]
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> [email protected]
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support