I updated/tested these updates. The tokenizer.perl & detokenizer.perl 
 work fine. However, the bjam build didn't copy all of the 
 nonbreaking_prefix.* files to the new location.

 I changed the scripts/jamfile from "glob 
 tokenizer/nonbreaking_prefixes/*" to "glob share/nonbreaking_prefixes/*" 
 and everything worked.

 Can you verify and update github?

 Tom



 On Tue, 26 Jun 2012 12:24:38 -0400, Hieu Hoang 
 <[email protected]> wrote:
> i think it's a fair suggestion. Tested and committed.
>
> 
> https://github.com/moses-smt/mosesdecoder/commit/87e2ec07627157be70bce515671b4c111ce2dc9c
>
> thx
> hieu
>
> On 25/06/2012 02:31, Tom Hoar wrote:
>> One more observation/suggestion about the scripts and then I'll give 
>> it a break.
>>
>> Several scripts including:
>> $SCRIPTS_ROOTDIR/ems/support/split-sentences.perl
>> $SCRIPTS_ROOTDIR/tokenizer/tokenizer.perl
>> $SCRIPTS_ROOTDIR/tokenizer/detokenizer.perl
>>
>> reference the common ./nonbreaking_prefixes folder. The 
>> split-sentences.perl script uses FindBin, but it would fail because 
>> there's no $SCRIPTS_ROOTDIR/ems/support/nonbreaking_prefixes/ 
>> subfolder.
>>
>> Since the $SCRIPTS_ROOTDIR hierarchy is being formalized, would it 
>> be a good idea to create a new $SCRIPTS_ROOTDIR/resources/ or possibly 
>> $SCRIPTS_ROOTDIR/share/ folder where scripts would find shared 
>> resources? In this case, it could be 
>> $SCRIPTS_ROOTDIR/share/nonbreaking_prefixes/
>>
>>
>> On Mon, 25 Jun 2012 11:42:38 +0700, Tom Hoar 
>> <[email protected]> wrote:
>>> I found the following scripts in $SCRIPTS_ROOTDIR with "use FindBin
>>> qw($Bin);":
>>>   $SCRIPTS_ROOTDIR/training/wrappers/parse-de-berkeley.perl
>>>   $SCRIPTS_ROOTDIR/training/wrappers/parse-de-bitpar.perl
>>>   $SCRIPTS_ROOTDIR/training/train-model.perl
>>>   $SCRIPTS_ROOTDIR/training/filter-model-given-input.pl
>>>   $SCRIPTS_ROOTDIR/training/mert-moses.pl
>>>   $SCRIPTS_ROOTDIR/training/mert-moses-multi.pl
>>>   $SCRIPTS_ROOTDIR/training/zmert-moses.pl
>>>   $SCRIPTS_ROOTDIR/analysis/weight-scan.pl
>>>   $SCRIPTS_ROOTDIR/ems/support/split-sentences.perl
>>>   $SCRIPTS_ROOTDIR/ems/experiment.perl
>>>   $SCRIPTS_ROOTDIR/generic/trainlm-irst.perl
>>>   $SCRIPTS_ROOTDIR/tokenizer/tokenizer.perl
>>>
>>> I tested the following with "use FindBin qw($RealBin);" and
>>> associated updates:
>>>   tokenizer.perl
>>>   mert-moses.pl
>>>   train-model.perl
>>>
>>> When a user calls the scripts directly in a terminal/console, $Bin
>>> and $RealBin versions function identically. If a user calls the
>>> $RealBin version via a symlink, the script resolves the scripts' 
>>> real
>>> path, finds the relative paths to dependencies and runs fine. The 
>>> $Bin
>>> version resolves to the symlink's path, can't find the relative 
>>> paths
>>> to dependencies and fails.
>>>
>>> I'd like to propose that with the planned changes to eliminate
>>> references to $SCRIPTS_ROOTDIR, the Moses Code Guide and/or the 
>>> Style
>>> Guide include an update for scripts to reference their $RealBin 
>>> vice
>>> the $Bin to support the use of symlinks for all users.
>>>
>>>
>>>
>>> On Fri, 22 Jun 2012 08:39:56 +0700, Tom Hoar
>>> <[email protected]> wrote:
>>>> Ok, I see now. train-model.perl uses:
>>>>
>>>>    use FindBin qw($Bin);
>>>>    my $SCRIPTS_ROOTDIR = $Bin;
>>>>
>>>> We use symlinks to flatten the scripts in $SCRIPTS_ROOTDIR into 
>>>> the
>>>> $prefix/bin folder. By default $prefix resolves to /usr/local.
>>>> Therefore, $prefix/bin is in $PATH. Our approach confuses the 
>>>> relative
>>>> path references in the script that rely on $Bin (without a 
>>>> separate
>>>> $SCRIPTS_ROOTDIR). In this case, the $PHRASE_EXTRACT concatenation 
>>>> on
>>>> line 1439 (step 5) caused my system to break because it resolved 
>>>> to
>>>> $PHRASE_EXTRACT = "$Bin/generic/extract-parallel.perl, which 
>>>> wasn't
>>>> there.
>>>>
>>>> In last night's troubleshooting, I re-referenced $SCRIPTS_ROOTDIR,
>>>> and placed symlinks to all $SCRIPTS_ROOTDIR in $prefix/bin. It was 
>>>> the
>>>> latter, not the former, that enabled the script run. My bad in my
>>>> email. There are similar challenges with other scripts, such as
>>>> tokenizer.perl and detokenizer.perl which reference subfolders
>>>> relative to their location.
>>>>
>>>> Also, thanks for sharing the goal. Effectively, you'll have 
>>>> $prefixA
>>>> and $prefixB. Our goals are a little different. We're trying to
>>>> install moses & all components into a hierarchy complaint to the 
>>>> Linux
>>>> Foundation's Filesystem Hierarchy Standard (FHA), in such a way 
>>>> it's
>>>> usable across most other Posix systems. Here's what we've come up
>>>> with:
>>>>
>>>> $IRSTLM, $RANDLM & $SRILM all point to $prefix. This way, their
>>>> resources, as well as the moses and MGIZA++ "make install" paths,
>>>> share the same $prefix/bin, $prefix/lib, $prefix/include (etc)
>>>> subfolders. We had to play some tricks with $SRILM to support
>>>> $prefix/sbin and the $MACHINE_TYPE references, but that support 
>>>> has
>>>> been there a long time and works well. Amazingly, we've been lucky
>>>> that there are no filename conflicts except for mkcls in GIZA++ 
>>>> and
>>>> MGIZA++.
>>>>
>>>> We place all component's original scripts under
>>>> $prefix/lib/<component>, as laid out by the component's authors. 
>>>> For
>>>> example, we move MGIZA++ $prefix/scripts to $prefix/lib/mgizapp 
>>>> and
>>>> configure moses' $SCRIPTS_ROOTDIR becomes 
>>>> $prefix/lib/mosesdecoder,
>>>> etc. We use of symlinks in $prefix/bin to reference various 
>>>> scripts in
>>>> $PATH (as above).
>>>>
>>>> According to http://perldoc.perl.org/FindBin.html, it looks like
>>>> changing to the script's "real" location vice command line 
>>>> reference
>>>> is possible with:
>>>>
>>>>    use FindBin qw($RealBin);
>>>>
>>>> This change will eliminate our need to symlink subfolders in
>>>> $prefix/bin, and still allow other Moses users to move $prefixA 
>>>> tree
>>>> anywhere they like. However, it might require a bit more editing 
>>>> in
>>>> each script to verify/resolve any relative references.
>>>>
>>>> For now, I'll continue using folder symlinks. I'll give you access 
>>>> to
>>>> a preview copy of our new binary install program via FTP with some
>>>> instructions how to make $prefix a private user folder instead of 
>>>> a
>>>> system level. Our approaches might give you some ideas about how 
>>>> moses
>>>> can support "linguistic programs over time". For example, in 
>>>> addition
>>>> to the "standard" Moses components (giza-pp, mgizapp, irstlm, 
>>>> randlm,
>>>> srilm), we currently install DoMY CE (corpus preparation and
>>>> translation workflow), BerkeleyAligner (phrase alignment), 
>>>> Champollion
>>>> Toolkit (sentence aligner), Stanford Aligner (Chinese/Arabic word
>>>> seg), MeCab (Japanese word seg), SWATH (open source Thai word 
>>>> seg),
>>>> and Langmatch (language ID) add-ons with this approach. We've also
>>>> mapped out support for m4loc, Okapi Framework, and the moses 
>>>> team's
>>>> sentence aligner.
>>>>
>>>> Regards,
>>>> Tom
>>>>
>>>>
>>>> On Thu, 21 Jun 2012 23:49:35 +0100, Hieu Hoang
>>>> <[email protected]> wrote:
>>>>> On 21/06/2012 16:46, Tom Hoar wrote:
>>>>>> Hieu,
>>>>>>
>>>>>> We're implementing these changes into DoMY. Some of these broke 
>>>>>> our
>>>>>> layout, but that's okay. We're adapt to your changes.
>>>>> thanks, much appreciated.
>>>>>> Updating -external-bin-dir was easy. Then, we scrapped our 
>>>>>> references
>>>>>> to $SCRIPTS_ROOTDIR based on your comments in train-model.perl. 
>>>>>> This,
>>>>>> however, caused step 5 to break. On closer inspection, a 
>>>>>> reference to
>>>>>> $SCRIPTS_ROOTDIR is still necessary at this point.
>>>>> that's odd. The variable $SCRIPTS_ROOTDIR is still there but it's 
>>>>> set in
>>>>> line 16-20. I didn't change these lines, I just removed the 
>>>>> ability to
>>>>> override it with some other value from the command line.
>>>>>
>>>>> are you sure step 5 breaks?
>>>>>>
>>>>>> How do you see the layout evolving without the $SCRIPTS_ROOTDIR 
>>>>>> value?
>>>>>> Since all of the scripts are in subfolders from 
>>>>>> $SCRIPTS_ROOTDIR, do
>>>>>> you think it's possible or feasible to set  $SCRIPTS_ROOTDIR ==
>>>>>> $_EXTERNAL_BINDIR? That's possible today by manually configuring 
>>>>>> bjam
>>>>>> for a build. However, if you have another layout in mind, would 
>>>>>> this
>>>>>> cause conflicts?
>>>>> I'm aiming for everyone to set up moses like so
>>>>>     [directory A]/scripts
>>>>>     [directory A]/bin
>>>>>     [directory B]   = external bin directory
>>>>> the external bin directory has giza/mgiza (and hopefully 
>>>>> linguistic
>>>>> programs over time).
>>>>>
>>>>> when you update moses, just replace scripts/ and bin/ . The 
>>>>> external bin
>>>>> directory can stay constant
>>>>>
>>>>>>
>>>>>> Tom
>>>>>>
>>>>>>
>>>>>> On Thu, 31 May 2012 20:42:56 +0100, Hieu Hoang
>>>>>> <[email protected]> wrote:
>>>>>>> Hi all
>>>>>>>
>>>>>>> if you're checking out the latest github code, there are some 
>>>>>>> changes
>>>>>>> you should be aware of:
>>>>>>>     1. There is a new argument to train-model.perl
>>>>>>>             -external-bin-dir [path]
>>>>>>>          This points to the directory where Giza++/mgiza lives. 
>>>>>>> Setting
>>>>>>> this is MANDATORY if you're using train-model.perl to do the 
>>>>>>> word
>>>>>>> alignment. It used to be hardcoded in the perl code itself.
>>>>>>>     2. All the training programs have been moved into the 
>>>>>>> directory
>>>>>>>            [MOSES-ROOT]/bin
>>>>>>>         They should be run from there, not from wherever the 
>>>>>>> source
>>>>>>> code is.
>>>>>>>     3. To roll out, simply copy the 2 directories
>>>>>>>            [MOSES-ROOT]/bin
>>>>>>>            [MOSES-ROOT]/scripts
>>>>>>>         to wherever you want, eg.
>>>>>>>            /home/hieu/moses/bin
>>>>>>>            /home/hieu/moses/scripts
>>>>>>>     4. If you don't want to move it anywhere, you can run it 
>>>>>>> from where
>>>>>>> you downloaded.
>>>>>>>     5. The EMS and example files have been updated.
>>>>>>>
>>>>>>> Hope this is ok for everyone. It may break some people's setup. 
>>>>>>> If
>>>>>>> possible, please change your setup. It's gonna help us all in 
>>>>>>> the long
>>>>>>> run. If not, flame me & i'll see what I can do
>>>>>>>
>>>>>>> HH
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Moses-support mailing list
>>>>>>> [email protected]
>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> Moses-support mailing list
>>>>> [email protected]
>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to