Re: [Moses-support] Train recasing model using IRSTLM

Jehan Pages Fri, 25 Nov 2011 08:06:38 -0800

Hi,

sorry if I have not been clear. The current version, the one you
likely had from Moses repository is indeed SRILM only. The -lm option
I wrote about is brand new. I wrote it today (then made a pull request
to upstream's Moses repository) and I see it has just been merged into
the main repository like an hour ago.


So now if you pull the latest code, you'll have this option. When you
wrote your email, it was not yet available. Hence no need to apologize
(note that even if there were this option before, there would be no
need to apologize either by the way! Plus, I am only discovering Moses
and its possibilities as well).

Also I see you compare it with -lm in train-model.perl. The one I
wrote has a different syntax.

And yeah I have been fixing as well a --help would be useful in
train-recaser. Maybe I'll write one too, unless someone does it
before! :-)

Jehan

On Fri, Nov 25, 2011 at 10:12 PM, Daniel Schaut <[email protected]> wrote:
> Hi Jehan,
>
> That's a nice idea and thanks for the trick. :) I thought the lm switch
> could only be used in connection with train-model. Apologies for the lack of
> knowledge. ;)
> So, all switches found in the reference
> http://www.statmt.org/moses/?n=FactoredTraining.TrainingParameters
> can be called with train-recaser, too? If yes, this could be mentioned in
> the manual by dropping a line.
>
> Would be nice to add a help switch for train-recaser, too.
>
> Daniel
>
> -----Ursprüngliche Nachricht-----
> Von: [email protected] [mailto:[email protected]] Im
> Auftrag von Jehan Pages
> Gesendet: Freitag, 25. November 2011 03:54
> An: <[email protected]>
> Betreff: Re: [Moses-support] Train recasing model using IRSTLM
>
> Hi all,
>
> rather than having to search through email archive, as I guess we are not
> the only one who won't use SRILM because it is proprietary (or some other
> reason), I thought the best would be to modify the existing script to be
> able to switch to IRSTLM when desired. I have just made a pull request on
> the Moses repository for updating this train-recaser.perl script.
>
> Description:
> Note that by default, the script will still use SRILM, which prevent from
> breakage any existing script calling the current version of
> train-recaser.perl.
> To use IRSTLM instead of SRILM, only adding "-lm irstlm" on the command line
> is enough.
> In case build-lm.sh is not in $PATH, there is also a new option -build-lm
> which allows one to specify the given path of the script to use (with
> build-lm.sh command line syntax).
>
> I think this should be better in long term. :-)
>
> Jehan
>
> On Sun, Nov 13, 2011 at 12:58 AM, Daniel Schaut <[email protected]>
> wrote:
>> Dear all,
>>
>>
>>
>> I’m having some difficulties to train the recasing model with IRSTLM.
>> I changed the train-recaser script according to
>>
>> http://www.mail-archive.com/[email protected]/msg01934.html
>>
>> but this results in an error which I don’t know how to fix.
>>
>>
>>
>> Error log:
>>
>> ----------------------------------------------------------------------
>> -
>>
>> (4) Training recasing model @ Sat Nov 12 14:49:06 CET 2011
>>
>> /home/user/mosestools/scripts-20111024-1127/training/train-model.perl
>> --root-dir /home/user/moses/work/recaser --model-dir
>> /home/user/moses/work/recaser --first-step 4 --alignment a --corpus
>> /home/user/moses/work/recaser/aligned --f lowercased --e cased
>> --max-phrase-length 1 --lm
>> 0:3:/home/user/moses/work/recaser/cased.irstlm.gz:1 -scripts-root-dir
>> /home/user/moses/mosestools/scripts-20111024-1127
>>
>> Can't exec
>> "/home/user/mosestools/scripts-20111024-1127/training/train-model.perl
>> ": No such file or directory at ./train-recaser.perl line 95.
>>
>>
>>
>> (11) Cleaning up @ Sat Nov 12 14:49:06 CET 2011
>>
>> ----------------------------------------------------------------------
>> -
>>
>>
>>
>> Then instead of using build-lm.sh, I gave it another try calling
>> compile-lm
>> directly:
>>
>> my $cmd = "/home/user/moses/mosestools/irstlm-5.60.03/bin/compile-lm
>> $CORPUS /dev/stdout | gzip -c > $DIR/cased.irstlm.gz
>>
>> where $CORPUS is a gzip iARPA file.
>>
>>
>>
>> Error log:
>>
>> ----------------------------------------------------------------------
>> -
>>
>> (3) Preparing data for training recasing model @ Sat Nov 12 15:11:26
>> CET
>> 2011
>>
>> /home/nexoc/moses/work/recaser/aligned.lowercased
>>
>> utf8 "\x8B" does not map to Unicode at ./train-recaser.perl line 64,
>> <CORPUS> line 1.
>>
>> Malformed UTF-8 character (fatal) at ./train-recaser.perl line 70,
>> <CORPUS> line 1.
>>
>> ----------------------------------------------------------------------
>> -
>>
>>
>>
>> Please see full error logs attached for more information.
>>
>>
>>
>> Could anyone give me a hint on how to train a recasing model with
>> either build-lm.sh or compile-lm? Help is very much appreciated.
>>
>>
>>
>> Thanks,
>>
>> Daniel
>>
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> [email protected]
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Train recasing model using IRSTLM

Reply via email to