I updated/tested these updates. The tokenizer.perl & detokenizer.perl work fine. However, the bjam build didn't copy all of the nonbreaking_prefix.* files to the new location.
I changed the scripts/jamfile from "glob tokenizer/nonbreaking_prefixes/*" to "glob share/nonbreaking_prefixes/*" and everything worked. Can you verify and update github? Tom On Tue, 26 Jun 2012 12:24:38 -0400, Hieu Hoang <[email protected]> wrote: > i think it's a fair suggestion. Tested and committed. > > > https://github.com/moses-smt/mosesdecoder/commit/87e2ec07627157be70bce515671b4c111ce2dc9c > > thx > hieu > > On 25/06/2012 02:31, Tom Hoar wrote: >> One more observation/suggestion about the scripts and then I'll give >> it a break. >> >> Several scripts including: >> $SCRIPTS_ROOTDIR/ems/support/split-sentences.perl >> $SCRIPTS_ROOTDIR/tokenizer/tokenizer.perl >> $SCRIPTS_ROOTDIR/tokenizer/detokenizer.perl >> >> reference the common ./nonbreaking_prefixes folder. The >> split-sentences.perl script uses FindBin, but it would fail because >> there's no $SCRIPTS_ROOTDIR/ems/support/nonbreaking_prefixes/ >> subfolder. >> >> Since the $SCRIPTS_ROOTDIR hierarchy is being formalized, would it >> be a good idea to create a new $SCRIPTS_ROOTDIR/resources/ or possibly >> $SCRIPTS_ROOTDIR/share/ folder where scripts would find shared >> resources? In this case, it could be >> $SCRIPTS_ROOTDIR/share/nonbreaking_prefixes/ >> >> >> On Mon, 25 Jun 2012 11:42:38 +0700, Tom Hoar >> <[email protected]> wrote: >>> I found the following scripts in $SCRIPTS_ROOTDIR with "use FindBin >>> qw($Bin);": >>> $SCRIPTS_ROOTDIR/training/wrappers/parse-de-berkeley.perl >>> $SCRIPTS_ROOTDIR/training/wrappers/parse-de-bitpar.perl >>> $SCRIPTS_ROOTDIR/training/train-model.perl >>> $SCRIPTS_ROOTDIR/training/filter-model-given-input.pl >>> $SCRIPTS_ROOTDIR/training/mert-moses.pl >>> $SCRIPTS_ROOTDIR/training/mert-moses-multi.pl >>> $SCRIPTS_ROOTDIR/training/zmert-moses.pl >>> $SCRIPTS_ROOTDIR/analysis/weight-scan.pl >>> $SCRIPTS_ROOTDIR/ems/support/split-sentences.perl >>> $SCRIPTS_ROOTDIR/ems/experiment.perl >>> $SCRIPTS_ROOTDIR/generic/trainlm-irst.perl >>> $SCRIPTS_ROOTDIR/tokenizer/tokenizer.perl >>> >>> I tested the following with "use FindBin qw($RealBin);" and >>> associated updates: >>> tokenizer.perl >>> mert-moses.pl >>> train-model.perl >>> >>> When a user calls the scripts directly in a terminal/console, $Bin >>> and $RealBin versions function identically. If a user calls the >>> $RealBin version via a symlink, the script resolves the scripts' >>> real >>> path, finds the relative paths to dependencies and runs fine. The >>> $Bin >>> version resolves to the symlink's path, can't find the relative >>> paths >>> to dependencies and fails. >>> >>> I'd like to propose that with the planned changes to eliminate >>> references to $SCRIPTS_ROOTDIR, the Moses Code Guide and/or the >>> Style >>> Guide include an update for scripts to reference their $RealBin >>> vice >>> the $Bin to support the use of symlinks for all users. >>> >>> >>> >>> On Fri, 22 Jun 2012 08:39:56 +0700, Tom Hoar >>> <[email protected]> wrote: >>>> Ok, I see now. train-model.perl uses: >>>> >>>> use FindBin qw($Bin); >>>> my $SCRIPTS_ROOTDIR = $Bin; >>>> >>>> We use symlinks to flatten the scripts in $SCRIPTS_ROOTDIR into >>>> the >>>> $prefix/bin folder. By default $prefix resolves to /usr/local. >>>> Therefore, $prefix/bin is in $PATH. Our approach confuses the >>>> relative >>>> path references in the script that rely on $Bin (without a >>>> separate >>>> $SCRIPTS_ROOTDIR). In this case, the $PHRASE_EXTRACT concatenation >>>> on >>>> line 1439 (step 5) caused my system to break because it resolved >>>> to >>>> $PHRASE_EXTRACT = "$Bin/generic/extract-parallel.perl, which >>>> wasn't >>>> there. >>>> >>>> In last night's troubleshooting, I re-referenced $SCRIPTS_ROOTDIR, >>>> and placed symlinks to all $SCRIPTS_ROOTDIR in $prefix/bin. It was >>>> the >>>> latter, not the former, that enabled the script run. My bad in my >>>> email. There are similar challenges with other scripts, such as >>>> tokenizer.perl and detokenizer.perl which reference subfolders >>>> relative to their location. >>>> >>>> Also, thanks for sharing the goal. Effectively, you'll have >>>> $prefixA >>>> and $prefixB. Our goals are a little different. We're trying to >>>> install moses & all components into a hierarchy complaint to the >>>> Linux >>>> Foundation's Filesystem Hierarchy Standard (FHA), in such a way >>>> it's >>>> usable across most other Posix systems. Here's what we've come up >>>> with: >>>> >>>> $IRSTLM, $RANDLM & $SRILM all point to $prefix. This way, their >>>> resources, as well as the moses and MGIZA++ "make install" paths, >>>> share the same $prefix/bin, $prefix/lib, $prefix/include (etc) >>>> subfolders. We had to play some tricks with $SRILM to support >>>> $prefix/sbin and the $MACHINE_TYPE references, but that support >>>> has >>>> been there a long time and works well. Amazingly, we've been lucky >>>> that there are no filename conflicts except for mkcls in GIZA++ >>>> and >>>> MGIZA++. >>>> >>>> We place all component's original scripts under >>>> $prefix/lib/<component>, as laid out by the component's authors. >>>> For >>>> example, we move MGIZA++ $prefix/scripts to $prefix/lib/mgizapp >>>> and >>>> configure moses' $SCRIPTS_ROOTDIR becomes >>>> $prefix/lib/mosesdecoder, >>>> etc. We use of symlinks in $prefix/bin to reference various >>>> scripts in >>>> $PATH (as above). >>>> >>>> According to http://perldoc.perl.org/FindBin.html, it looks like >>>> changing to the script's "real" location vice command line >>>> reference >>>> is possible with: >>>> >>>> use FindBin qw($RealBin); >>>> >>>> This change will eliminate our need to symlink subfolders in >>>> $prefix/bin, and still allow other Moses users to move $prefixA >>>> tree >>>> anywhere they like. However, it might require a bit more editing >>>> in >>>> each script to verify/resolve any relative references. >>>> >>>> For now, I'll continue using folder symlinks. I'll give you access >>>> to >>>> a preview copy of our new binary install program via FTP with some >>>> instructions how to make $prefix a private user folder instead of >>>> a >>>> system level. Our approaches might give you some ideas about how >>>> moses >>>> can support "linguistic programs over time". For example, in >>>> addition >>>> to the "standard" Moses components (giza-pp, mgizapp, irstlm, >>>> randlm, >>>> srilm), we currently install DoMY CE (corpus preparation and >>>> translation workflow), BerkeleyAligner (phrase alignment), >>>> Champollion >>>> Toolkit (sentence aligner), Stanford Aligner (Chinese/Arabic word >>>> seg), MeCab (Japanese word seg), SWATH (open source Thai word >>>> seg), >>>> and Langmatch (language ID) add-ons with this approach. We've also >>>> mapped out support for m4loc, Okapi Framework, and the moses >>>> team's >>>> sentence aligner. >>>> >>>> Regards, >>>> Tom >>>> >>>> >>>> On Thu, 21 Jun 2012 23:49:35 +0100, Hieu Hoang >>>> <[email protected]> wrote: >>>>> On 21/06/2012 16:46, Tom Hoar wrote: >>>>>> Hieu, >>>>>> >>>>>> We're implementing these changes into DoMY. Some of these broke >>>>>> our >>>>>> layout, but that's okay. We're adapt to your changes. >>>>> thanks, much appreciated. >>>>>> Updating -external-bin-dir was easy. Then, we scrapped our >>>>>> references >>>>>> to $SCRIPTS_ROOTDIR based on your comments in train-model.perl. >>>>>> This, >>>>>> however, caused step 5 to break. On closer inspection, a >>>>>> reference to >>>>>> $SCRIPTS_ROOTDIR is still necessary at this point. >>>>> that's odd. The variable $SCRIPTS_ROOTDIR is still there but it's >>>>> set in >>>>> line 16-20. I didn't change these lines, I just removed the >>>>> ability to >>>>> override it with some other value from the command line. >>>>> >>>>> are you sure step 5 breaks? >>>>>> >>>>>> How do you see the layout evolving without the $SCRIPTS_ROOTDIR >>>>>> value? >>>>>> Since all of the scripts are in subfolders from >>>>>> $SCRIPTS_ROOTDIR, do >>>>>> you think it's possible or feasible to set $SCRIPTS_ROOTDIR == >>>>>> $_EXTERNAL_BINDIR? That's possible today by manually configuring >>>>>> bjam >>>>>> for a build. However, if you have another layout in mind, would >>>>>> this >>>>>> cause conflicts? >>>>> I'm aiming for everyone to set up moses like so >>>>> [directory A]/scripts >>>>> [directory A]/bin >>>>> [directory B] = external bin directory >>>>> the external bin directory has giza/mgiza (and hopefully >>>>> linguistic >>>>> programs over time). >>>>> >>>>> when you update moses, just replace scripts/ and bin/ . The >>>>> external bin >>>>> directory can stay constant >>>>> >>>>>> >>>>>> Tom >>>>>> >>>>>> >>>>>> On Thu, 31 May 2012 20:42:56 +0100, Hieu Hoang >>>>>> <[email protected]> wrote: >>>>>>> Hi all >>>>>>> >>>>>>> if you're checking out the latest github code, there are some >>>>>>> changes >>>>>>> you should be aware of: >>>>>>> 1. There is a new argument to train-model.perl >>>>>>> -external-bin-dir [path] >>>>>>> This points to the directory where Giza++/mgiza lives. >>>>>>> Setting >>>>>>> this is MANDATORY if you're using train-model.perl to do the >>>>>>> word >>>>>>> alignment. It used to be hardcoded in the perl code itself. >>>>>>> 2. All the training programs have been moved into the >>>>>>> directory >>>>>>> [MOSES-ROOT]/bin >>>>>>> They should be run from there, not from wherever the >>>>>>> source >>>>>>> code is. >>>>>>> 3. To roll out, simply copy the 2 directories >>>>>>> [MOSES-ROOT]/bin >>>>>>> [MOSES-ROOT]/scripts >>>>>>> to wherever you want, eg. >>>>>>> /home/hieu/moses/bin >>>>>>> /home/hieu/moses/scripts >>>>>>> 4. If you don't want to move it anywhere, you can run it >>>>>>> from where >>>>>>> you downloaded. >>>>>>> 5. The EMS and example files have been updated. >>>>>>> >>>>>>> Hope this is ok for everyone. It may break some people's setup. >>>>>>> If >>>>>>> possible, please change your setup. It's gonna help us all in >>>>>>> the long >>>>>>> run. If not, flame me & i'll see what I can do >>>>>>> >>>>>>> HH >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Moses-support mailing list >>>>>>> [email protected] >>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>>>> >>>>>> >>>>> _______________________________________________ >>>>> Moses-support mailing list >>>>> [email protected] >>>>> http://mailman.mit.edu/mailman/listinfo/moses-support >> >> _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
