[Moses-support] Cross-platform versions of train-model.perl, mert-moses.pl

Tom Hoar Sun, 01 Mar 2015 23:54:07 -0800

Our first version of cross-platform versions of train-model.perl,mert-moses.pl and their sub-scripts is ready for community testing.

In total, we touched 12 Perl and Python files, including one fromMGIZA++. These are the files. We have changed the file names so the twoversions can reside side-by side in the same scripts tree. Please notethe respective -x or _x suffix before the file extension.


1. lib/mgizapp/scripts/merge_alignment_x.py
2. lib/mosesdecoder/scripts/training/train-model-x.perl
3. lib/mosesdecoder/scripts/training/reduce_combine_x.pl
4. lib/mosesdecoder/scripts/training/mert-moses-x.pl
5. lib/mosesdecoder/scripts/training/LexicalTranslationModelX.pm
6. lib/mosesdecoder/scripts/training/giza2bal-x.pl
7. lib/mosesdecoder/scripts/training/flexibility_score_x.py
8. lib/mosesdecoder/scripts/training/filter-rule-table-x.py
9. lib/mosesdecoder/scripts/training/filter-model-given-input-x.pl
10. lib/mosesdecoder/scripts/generic/score-parallel-x.perl
11. lib/mosesdecoder/scripts/generic/moses_sim_pe_x.py
12. lib/mosesdecoder/scripts/generic/extract-parallel-x.perl

Later this week, we will add these files to trunk. I invite thecommunity's scrutiny on both the Posix and Windows runtimes. Theseupdates required some heavy-duty clean-up. These are a significantdeparture from the previous scripts. So, we took the liberty to also addsome new features. Overall, however, the scripts are intended to runidentically to the existing Posix-only versions. I'm happy to discussour strategy/design with anyone who is interested.


UPDATES:

1. Gracefully exit from forked processes. Now, the script catches a
   subprocess failure shuts down ASAP. Typically, this means the other
   forks to terminate. This means a failure in step 2 (GIZA), will
   terminate during step 2. The users' won't have to trace problems
   that reveal themselves in step 5 back to step 2.
2. Temp file space. The --temp-dir argument is now fully operational in
   all steps that create temp files. By default, train-model.perl uses
   Perl to create a subfolder under the system temp folder. Perl then
   cleans up that folder when the context terminates, unless there's a
   crash when it then leaves the folder there (nice touch for
   troubleshooting). The user can manually define a --temp-dir from the
   command line. These manually-created folders are not deleted. Users
   who define their own might probably want to troubleshoot problems.
   So, when update (1) above reports out of drive space errors, users
   can simply mount a new drive and point to it with --temp-dir
3. All forked processing happens in child processes. The script spins
   off child processes and the parent process simply sits idle and
   waits. Code management is easier this way instead of all children in
   some places and parent/children for other places.
4. The scripts have copies of many housekeeping functions (sub) for
   things like concatenating new_path(), makedirs(), removedirs(),
   normalize_path(), and others.
5. Those familiar with Windows environments will notice that we
   concatenate a ".exe" to each Windows binary even Windows does not
   require this. The objective is to minimize tech support requests. If
   we do not rely on the .exe extension, it is possible for the Windows
   environment to launch a .bat, .cmd, or other file instead of the
   binary. Granted, it's probably an edge case, but you never know in
   the WIndows world.
6. Full support for data's relative paths from the command prompt. The
   path normalization uses Perl's rel2abs() and Python's
   os.path.abspath() for every file path on the disk system whether
   typed in by a user or read from a config file. The absolute
   reference is always the user's current working directory (CWD).
7. The path concatenation always normalizes path strings to using the
   current host's OS separator. Although Perl and Python (maybe C?) can
   tolerate forward slashes for Windows paths, we normalized them. This
   is necessary for command-line arguments passed to launch new
   subprocesses. If we do it there, its just as easy for everything.
8. One interesting update. The train-model.perl script never checked if
   the corpus.f and corpus.e files were present on the file system. We
   added that and checks for many other input files, too.


WHAT'S NEXT:

In a parallel effort, we have been working to update the binaries. Likethe Perl scripts, the first step is to remove all "posix" shell callsexcept for the 8 dependencies listed below. This means native C/C++ codewill replace calls to 'mkdir', 'rm', etc. Concurrently, we are alsoremoving hard-coded path separators and concatenating all paths withnative OS separators. Once the Posix dependencies are gone, we'lladdress specific import library support.

Again, I invite list members to test the new version.Verifying/validating parity performance on Linux is the first QC step.Thanks all!

Tom

These scripts require these dependencies (sorry, not by step) be presentand executable of they will fail to run.

A) infrastructure support. We tested with the native Windows binaries ofGow and Cygwin versions. There are some bugs, but we need to have fullbinaries before we can fully debug their use. The system checks thethese files are in the path present by executing and reading theirreturn-codes.


1. gsplit[.exe] or split[.exe]
2. gsort[.exe] or sort[.exe]
3. pigz[.exe]  or gzip[.exe]
4. unpigz[.exe] or gunzip[.exe]
5. bzcat[.exe]
6. uniq[.exe]
7. zcat[.exe] (gzip[.exe])
8. cat or type

B) External binaries & scripts i.e. --external-bin-dir for (M)GIZA++.The system checks the these files are present by executing theirpredicted installed locations and reading their return-codes.


1. GIZA++[.exe] or mgiza[.exe] or mgizapp[.exe]
2. snt2cooc.out[.exe] or snt2cooc[.exe] (or snt2cooc.pl if configured)
3. mkcls[.exe]
4. merge_alignment_x.py

C) Moses binaries & scripts

1. giza2bal-x.pl
2. symal.[exe]
3. extract[.exe]
4. eppex[.exe]
5. extract-rules[.exe]
6. extract-ghkm[.exe]
7. extract-parallel-x.perl
8. score.[exe]
9. memscore[.exe]
10. consolidate[.exe]
11. score-parallel-x.perl
12. flexibility_score_x.py
13. mert[.exe]
14. extractor[.exe]
15. pro[.exe]
16. kbmira[.exe]
17. evaluator[.exe]
18. filter-model-given-input-x.pl
19. remove-segmentation-markup.perl
20. promix/main.py (not complete)
21. moses[.exe] or moses-cmd.exe
22. moses_sim_pe_x.py
23. NOTE: NOT qsub-wrapper-x.pl or other moses-parallel works

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

[Moses-support] Cross-platform versions of train-model.perl, mert-moses.pl

Reply via email to