i think it's a fair suggestion. Tested and committed. https://github.com/moses-smt/mosesdecoder/commit/87e2ec07627157be70bce515671b4c111ce2dc9c
thx hieu On 25/06/2012 02:31, Tom Hoar wrote: > One more observation/suggestion about the scripts and then I'll give > it a break. > > Several scripts including: > $SCRIPTS_ROOTDIR/ems/support/split-sentences.perl > $SCRIPTS_ROOTDIR/tokenizer/tokenizer.perl > $SCRIPTS_ROOTDIR/tokenizer/detokenizer.perl > > reference the common ./nonbreaking_prefixes folder. The > split-sentences.perl script uses FindBin, but it would fail because > there's no $SCRIPTS_ROOTDIR/ems/support/nonbreaking_prefixes/ subfolder. > > Since the $SCRIPTS_ROOTDIR hierarchy is being formalized, would it be > a good idea to create a new $SCRIPTS_ROOTDIR/resources/ or possibly > $SCRIPTS_ROOTDIR/share/ folder where scripts would find shared > resources? In this case, it could be > $SCRIPTS_ROOTDIR/share/nonbreaking_prefixes/ > > > On Mon, 25 Jun 2012 11:42:38 +0700, Tom Hoar > <[email protected]> wrote: >> I found the following scripts in $SCRIPTS_ROOTDIR with "use FindBin >> qw($Bin);": >> $SCRIPTS_ROOTDIR/training/wrappers/parse-de-berkeley.perl >> $SCRIPTS_ROOTDIR/training/wrappers/parse-de-bitpar.perl >> $SCRIPTS_ROOTDIR/training/train-model.perl >> $SCRIPTS_ROOTDIR/training/filter-model-given-input.pl >> $SCRIPTS_ROOTDIR/training/mert-moses.pl >> $SCRIPTS_ROOTDIR/training/mert-moses-multi.pl >> $SCRIPTS_ROOTDIR/training/zmert-moses.pl >> $SCRIPTS_ROOTDIR/analysis/weight-scan.pl >> $SCRIPTS_ROOTDIR/ems/support/split-sentences.perl >> $SCRIPTS_ROOTDIR/ems/experiment.perl >> $SCRIPTS_ROOTDIR/generic/trainlm-irst.perl >> $SCRIPTS_ROOTDIR/tokenizer/tokenizer.perl >> >> I tested the following with "use FindBin qw($RealBin);" and >> associated updates: >> tokenizer.perl >> mert-moses.pl >> train-model.perl >> >> When a user calls the scripts directly in a terminal/console, $Bin >> and $RealBin versions function identically. If a user calls the >> $RealBin version via a symlink, the script resolves the scripts' real >> path, finds the relative paths to dependencies and runs fine. The $Bin >> version resolves to the symlink's path, can't find the relative paths >> to dependencies and fails. >> >> I'd like to propose that with the planned changes to eliminate >> references to $SCRIPTS_ROOTDIR, the Moses Code Guide and/or the Style >> Guide include an update for scripts to reference their $RealBin vice >> the $Bin to support the use of symlinks for all users. >> >> >> >> On Fri, 22 Jun 2012 08:39:56 +0700, Tom Hoar >> <[email protected]> wrote: >>> Ok, I see now. train-model.perl uses: >>> >>> use FindBin qw($Bin); >>> my $SCRIPTS_ROOTDIR = $Bin; >>> >>> We use symlinks to flatten the scripts in $SCRIPTS_ROOTDIR into the >>> $prefix/bin folder. By default $prefix resolves to /usr/local. >>> Therefore, $prefix/bin is in $PATH. Our approach confuses the relative >>> path references in the script that rely on $Bin (without a separate >>> $SCRIPTS_ROOTDIR). In this case, the $PHRASE_EXTRACT concatenation on >>> line 1439 (step 5) caused my system to break because it resolved to >>> $PHRASE_EXTRACT = "$Bin/generic/extract-parallel.perl, which wasn't >>> there. >>> >>> In last night's troubleshooting, I re-referenced $SCRIPTS_ROOTDIR, >>> and placed symlinks to all $SCRIPTS_ROOTDIR in $prefix/bin. It was the >>> latter, not the former, that enabled the script run. My bad in my >>> email. There are similar challenges with other scripts, such as >>> tokenizer.perl and detokenizer.perl which reference subfolders >>> relative to their location. >>> >>> Also, thanks for sharing the goal. Effectively, you'll have $prefixA >>> and $prefixB. Our goals are a little different. We're trying to >>> install moses & all components into a hierarchy complaint to the Linux >>> Foundation's Filesystem Hierarchy Standard (FHA), in such a way it's >>> usable across most other Posix systems. Here's what we've come up >>> with: >>> >>> $IRSTLM, $RANDLM & $SRILM all point to $prefix. This way, their >>> resources, as well as the moses and MGIZA++ "make install" paths, >>> share the same $prefix/bin, $prefix/lib, $prefix/include (etc) >>> subfolders. We had to play some tricks with $SRILM to support >>> $prefix/sbin and the $MACHINE_TYPE references, but that support has >>> been there a long time and works well. Amazingly, we've been lucky >>> that there are no filename conflicts except for mkcls in GIZA++ and >>> MGIZA++. >>> >>> We place all component's original scripts under >>> $prefix/lib/<component>, as laid out by the component's authors. For >>> example, we move MGIZA++ $prefix/scripts to $prefix/lib/mgizapp and >>> configure moses' $SCRIPTS_ROOTDIR becomes $prefix/lib/mosesdecoder, >>> etc. We use of symlinks in $prefix/bin to reference various scripts in >>> $PATH (as above). >>> >>> According to http://perldoc.perl.org/FindBin.html, it looks like >>> changing to the script's "real" location vice command line reference >>> is possible with: >>> >>> use FindBin qw($RealBin); >>> >>> This change will eliminate our need to symlink subfolders in >>> $prefix/bin, and still allow other Moses users to move $prefixA tree >>> anywhere they like. However, it might require a bit more editing in >>> each script to verify/resolve any relative references. >>> >>> For now, I'll continue using folder symlinks. I'll give you access to >>> a preview copy of our new binary install program via FTP with some >>> instructions how to make $prefix a private user folder instead of a >>> system level. Our approaches might give you some ideas about how moses >>> can support "linguistic programs over time". For example, in addition >>> to the "standard" Moses components (giza-pp, mgizapp, irstlm, randlm, >>> srilm), we currently install DoMY CE (corpus preparation and >>> translation workflow), BerkeleyAligner (phrase alignment), Champollion >>> Toolkit (sentence aligner), Stanford Aligner (Chinese/Arabic word >>> seg), MeCab (Japanese word seg), SWATH (open source Thai word seg), >>> and Langmatch (language ID) add-ons with this approach. We've also >>> mapped out support for m4loc, Okapi Framework, and the moses team's >>> sentence aligner. >>> >>> Regards, >>> Tom >>> >>> >>> On Thu, 21 Jun 2012 23:49:35 +0100, Hieu Hoang >>> <[email protected]> wrote: >>>> On 21/06/2012 16:46, Tom Hoar wrote: >>>>> Hieu, >>>>> >>>>> We're implementing these changes into DoMY. Some of these broke our >>>>> layout, but that's okay. We're adapt to your changes. >>>> thanks, much appreciated. >>>>> Updating -external-bin-dir was easy. Then, we scrapped our references >>>>> to $SCRIPTS_ROOTDIR based on your comments in train-model.perl. This, >>>>> however, caused step 5 to break. On closer inspection, a reference to >>>>> $SCRIPTS_ROOTDIR is still necessary at this point. >>>> that's odd. The variable $SCRIPTS_ROOTDIR is still there but it's >>>> set in >>>> line 16-20. I didn't change these lines, I just removed the ability to >>>> override it with some other value from the command line. >>>> >>>> are you sure step 5 breaks? >>>>> >>>>> How do you see the layout evolving without the $SCRIPTS_ROOTDIR >>>>> value? >>>>> Since all of the scripts are in subfolders from $SCRIPTS_ROOTDIR, do >>>>> you think it's possible or feasible to set $SCRIPTS_ROOTDIR == >>>>> $_EXTERNAL_BINDIR? That's possible today by manually configuring bjam >>>>> for a build. However, if you have another layout in mind, would this >>>>> cause conflicts? >>>> I'm aiming for everyone to set up moses like so >>>> [directory A]/scripts >>>> [directory A]/bin >>>> [directory B] = external bin directory >>>> the external bin directory has giza/mgiza (and hopefully linguistic >>>> programs over time). >>>> >>>> when you update moses, just replace scripts/ and bin/ . The >>>> external bin >>>> directory can stay constant >>>> >>>>> >>>>> Tom >>>>> >>>>> >>>>> On Thu, 31 May 2012 20:42:56 +0100, Hieu Hoang >>>>> <[email protected]> wrote: >>>>>> Hi all >>>>>> >>>>>> if you're checking out the latest github code, there are some >>>>>> changes >>>>>> you should be aware of: >>>>>> 1. There is a new argument to train-model.perl >>>>>> -external-bin-dir [path] >>>>>> This points to the directory where Giza++/mgiza lives. >>>>>> Setting >>>>>> this is MANDATORY if you're using train-model.perl to do the word >>>>>> alignment. It used to be hardcoded in the perl code itself. >>>>>> 2. All the training programs have been moved into the directory >>>>>> [MOSES-ROOT]/bin >>>>>> They should be run from there, not from wherever the source >>>>>> code is. >>>>>> 3. To roll out, simply copy the 2 directories >>>>>> [MOSES-ROOT]/bin >>>>>> [MOSES-ROOT]/scripts >>>>>> to wherever you want, eg. >>>>>> /home/hieu/moses/bin >>>>>> /home/hieu/moses/scripts >>>>>> 4. If you don't want to move it anywhere, you can run it from >>>>>> where >>>>>> you downloaded. >>>>>> 5. The EMS and example files have been updated. >>>>>> >>>>>> Hope this is ok for everyone. It may break some people's setup. If >>>>>> possible, please change your setup. It's gonna help us all in the >>>>>> long >>>>>> run. If not, flame me & i'll see what I can do >>>>>> >>>>>> HH >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Moses-support mailing list >>>>>> [email protected] >>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>>> >>>>> >>>> _______________________________________________ >>>> Moses-support mailing list >>>> [email protected] >>>> http://mailman.mit.edu/mailman/listinfo/moses-support > > _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
