i think it's a fair suggestion. Tested and committed.
   
https://github.com/moses-smt/mosesdecoder/commit/87e2ec07627157be70bce515671b4c111ce2dc9c

thx
hieu

On 25/06/2012 02:31, Tom Hoar wrote:
> One more observation/suggestion about the scripts and then I'll give 
> it a break.
>
> Several scripts including:
> $SCRIPTS_ROOTDIR/ems/support/split-sentences.perl
> $SCRIPTS_ROOTDIR/tokenizer/tokenizer.perl
> $SCRIPTS_ROOTDIR/tokenizer/detokenizer.perl
>
> reference the common ./nonbreaking_prefixes folder. The 
> split-sentences.perl script uses FindBin, but it would fail because 
> there's no $SCRIPTS_ROOTDIR/ems/support/nonbreaking_prefixes/ subfolder.
>
> Since the $SCRIPTS_ROOTDIR hierarchy is being formalized, would it be 
> a good idea to create a new $SCRIPTS_ROOTDIR/resources/ or possibly 
> $SCRIPTS_ROOTDIR/share/ folder where scripts would find shared 
> resources? In this case, it could be 
> $SCRIPTS_ROOTDIR/share/nonbreaking_prefixes/
>
>
> On Mon, 25 Jun 2012 11:42:38 +0700, Tom Hoar 
> <[email protected]> wrote:
>> I found the following scripts in $SCRIPTS_ROOTDIR with "use FindBin
>> qw($Bin);":
>>   $SCRIPTS_ROOTDIR/training/wrappers/parse-de-berkeley.perl
>>   $SCRIPTS_ROOTDIR/training/wrappers/parse-de-bitpar.perl
>>   $SCRIPTS_ROOTDIR/training/train-model.perl
>>   $SCRIPTS_ROOTDIR/training/filter-model-given-input.pl
>>   $SCRIPTS_ROOTDIR/training/mert-moses.pl
>>   $SCRIPTS_ROOTDIR/training/mert-moses-multi.pl
>>   $SCRIPTS_ROOTDIR/training/zmert-moses.pl
>>   $SCRIPTS_ROOTDIR/analysis/weight-scan.pl
>>   $SCRIPTS_ROOTDIR/ems/support/split-sentences.perl
>>   $SCRIPTS_ROOTDIR/ems/experiment.perl
>>   $SCRIPTS_ROOTDIR/generic/trainlm-irst.perl
>>   $SCRIPTS_ROOTDIR/tokenizer/tokenizer.perl
>>
>> I tested the following with "use FindBin qw($RealBin);" and
>> associated updates:
>>   tokenizer.perl
>>   mert-moses.pl
>>   train-model.perl
>>
>> When a user calls the scripts directly in a terminal/console, $Bin
>> and $RealBin versions function identically. If a user calls the
>> $RealBin version via a symlink, the script resolves the scripts' real
>> path, finds the relative paths to dependencies and runs fine. The $Bin
>> version resolves to the symlink's path, can't find the relative paths
>> to dependencies and fails.
>>
>> I'd like to propose that with the planned changes to eliminate
>> references to $SCRIPTS_ROOTDIR, the Moses Code Guide and/or the Style
>> Guide include an update for scripts to reference their $RealBin vice
>> the $Bin to support the use of symlinks for all users.
>>
>>
>>
>> On Fri, 22 Jun 2012 08:39:56 +0700, Tom Hoar
>> <[email protected]> wrote:
>>> Ok, I see now. train-model.perl uses:
>>>
>>>    use FindBin qw($Bin);
>>>    my $SCRIPTS_ROOTDIR = $Bin;
>>>
>>> We use symlinks to flatten the scripts in $SCRIPTS_ROOTDIR into the
>>> $prefix/bin folder. By default $prefix resolves to /usr/local.
>>> Therefore, $prefix/bin is in $PATH. Our approach confuses the relative
>>> path references in the script that rely on $Bin (without a separate
>>> $SCRIPTS_ROOTDIR). In this case, the $PHRASE_EXTRACT concatenation on
>>> line 1439 (step 5) caused my system to break because it resolved to
>>> $PHRASE_EXTRACT = "$Bin/generic/extract-parallel.perl, which wasn't
>>> there.
>>>
>>> In last night's troubleshooting, I re-referenced $SCRIPTS_ROOTDIR,
>>> and placed symlinks to all $SCRIPTS_ROOTDIR in $prefix/bin. It was the
>>> latter, not the former, that enabled the script run. My bad in my
>>> email. There are similar challenges with other scripts, such as
>>> tokenizer.perl and detokenizer.perl which reference subfolders
>>> relative to their location.
>>>
>>> Also, thanks for sharing the goal. Effectively, you'll have $prefixA
>>> and $prefixB. Our goals are a little different. We're trying to
>>> install moses & all components into a hierarchy complaint to the Linux
>>> Foundation's Filesystem Hierarchy Standard (FHA), in such a way it's
>>> usable across most other Posix systems. Here's what we've come up
>>> with:
>>>
>>> $IRSTLM, $RANDLM & $SRILM all point to $prefix. This way, their
>>> resources, as well as the moses and MGIZA++ "make install" paths,
>>> share the same $prefix/bin, $prefix/lib, $prefix/include (etc)
>>> subfolders. We had to play some tricks with $SRILM to support
>>> $prefix/sbin and the $MACHINE_TYPE references, but that support has
>>> been there a long time and works well. Amazingly, we've been lucky
>>> that there are no filename conflicts except for mkcls in GIZA++ and
>>> MGIZA++.
>>>
>>> We place all component's original scripts under
>>> $prefix/lib/<component>, as laid out by the component's authors. For
>>> example, we move MGIZA++ $prefix/scripts to $prefix/lib/mgizapp and
>>> configure moses' $SCRIPTS_ROOTDIR becomes $prefix/lib/mosesdecoder,
>>> etc. We use of symlinks in $prefix/bin to reference various scripts in
>>> $PATH (as above).
>>>
>>> According to http://perldoc.perl.org/FindBin.html, it looks like
>>> changing to the script's "real" location vice command line reference
>>> is possible with:
>>>
>>>    use FindBin qw($RealBin);
>>>
>>> This change will eliminate our need to symlink subfolders in
>>> $prefix/bin, and still allow other Moses users to move $prefixA tree
>>> anywhere they like. However, it might require a bit more editing in
>>> each script to verify/resolve any relative references.
>>>
>>> For now, I'll continue using folder symlinks. I'll give you access to
>>> a preview copy of our new binary install program via FTP with some
>>> instructions how to make $prefix a private user folder instead of a
>>> system level. Our approaches might give you some ideas about how moses
>>> can support "linguistic programs over time". For example, in addition
>>> to the "standard" Moses components (giza-pp, mgizapp, irstlm, randlm,
>>> srilm), we currently install DoMY CE (corpus preparation and
>>> translation workflow), BerkeleyAligner (phrase alignment), Champollion
>>> Toolkit (sentence aligner), Stanford Aligner (Chinese/Arabic word
>>> seg), MeCab (Japanese word seg), SWATH (open source Thai word seg),
>>> and Langmatch (language ID) add-ons with this approach. We've also
>>> mapped out support for m4loc, Okapi Framework, and the moses team's
>>> sentence aligner.
>>>
>>> Regards,
>>> Tom
>>>
>>>
>>> On Thu, 21 Jun 2012 23:49:35 +0100, Hieu Hoang
>>> <[email protected]> wrote:
>>>> On 21/06/2012 16:46, Tom Hoar wrote:
>>>>> Hieu,
>>>>>
>>>>> We're implementing these changes into DoMY. Some of these broke our
>>>>> layout, but that's okay. We're adapt to your changes.
>>>> thanks, much appreciated.
>>>>> Updating -external-bin-dir was easy. Then, we scrapped our references
>>>>> to $SCRIPTS_ROOTDIR based on your comments in train-model.perl. This,
>>>>> however, caused step 5 to break. On closer inspection, a reference to
>>>>> $SCRIPTS_ROOTDIR is still necessary at this point.
>>>> that's odd. The variable $SCRIPTS_ROOTDIR is still there but it's 
>>>> set in
>>>> line 16-20. I didn't change these lines, I just removed the ability to
>>>> override it with some other value from the command line.
>>>>
>>>> are you sure step 5 breaks?
>>>>>
>>>>> How do you see the layout evolving without the $SCRIPTS_ROOTDIR 
>>>>> value?
>>>>> Since all of the scripts are in subfolders from $SCRIPTS_ROOTDIR, do
>>>>> you think it's possible or feasible to set  $SCRIPTS_ROOTDIR ==
>>>>> $_EXTERNAL_BINDIR? That's possible today by manually configuring bjam
>>>>> for a build. However, if you have another layout in mind, would this
>>>>> cause conflicts?
>>>> I'm aiming for everyone to set up moses like so
>>>>     [directory A]/scripts
>>>>     [directory A]/bin
>>>>     [directory B]   = external bin directory
>>>> the external bin directory has giza/mgiza (and hopefully linguistic
>>>> programs over time).
>>>>
>>>> when you update moses, just replace scripts/ and bin/ . The 
>>>> external bin
>>>> directory can stay constant
>>>>
>>>>>
>>>>> Tom
>>>>>
>>>>>
>>>>> On Thu, 31 May 2012 20:42:56 +0100, Hieu Hoang
>>>>> <[email protected]> wrote:
>>>>>> Hi all
>>>>>>
>>>>>> if you're checking out the latest github code, there are some 
>>>>>> changes
>>>>>> you should be aware of:
>>>>>>     1. There is a new argument to train-model.perl
>>>>>>             -external-bin-dir [path]
>>>>>>          This points to the directory where Giza++/mgiza lives. 
>>>>>> Setting
>>>>>> this is MANDATORY if you're using train-model.perl to do the word
>>>>>> alignment. It used to be hardcoded in the perl code itself.
>>>>>>     2. All the training programs have been moved into the directory
>>>>>>            [MOSES-ROOT]/bin
>>>>>>         They should be run from there, not from wherever the source
>>>>>> code is.
>>>>>>     3. To roll out, simply copy the 2 directories
>>>>>>            [MOSES-ROOT]/bin
>>>>>>            [MOSES-ROOT]/scripts
>>>>>>         to wherever you want, eg.
>>>>>>            /home/hieu/moses/bin
>>>>>>            /home/hieu/moses/scripts
>>>>>>     4. If you don't want to move it anywhere, you can run it from 
>>>>>> where
>>>>>> you downloaded.
>>>>>>     5. The EMS and example files have been updated.
>>>>>>
>>>>>> Hope this is ok for everyone. It may break some people's setup. If
>>>>>> possible, please change your setup. It's gonna help us all in the 
>>>>>> long
>>>>>> run. If not, flame me & i'll see what I can do
>>>>>>
>>>>>> HH
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Moses-support mailing list
>>>>>> [email protected]
>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>
>>>>>
>>>> _______________________________________________
>>>> Moses-support mailing list
>>>> [email protected]
>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to