Re: [Moses-support] Train recasing model using IRSTLM

Jehan Pages Sun, 27 Nov 2011 02:47:22 -0800

Hi,

just to say I just proposed a --help option as well right now via a
pull request.
Hopefully should be integrated too. :-)


Jehan

On Sat, Nov 26, 2011 at 1:05 AM, Jehan Pages <[email protected]> wrote:
> Hi,
>
> sorry if I have not been clear. The current version, the one you
> likely had from Moses repository is indeed SRILM only. The -lm option
> I wrote about is brand new. I wrote it today (then made a pull request
> to upstream's Moses repository) and I see it has just been merged into
> the main repository like an hour ago.
>
> So now if you pull the latest code, you'll have this option. When you
> wrote your email, it was not yet available. Hence no need to apologize
> (note that even if there were this option before, there would be no
> need to apologize either by the way! Plus, I am only discovering Moses
> and its possibilities as well).
>
> Also I see you compare it with -lm in train-model.perl. The one I
> wrote has a different syntax.
>
> And yeah I have been fixing as well a --help would be useful in
> train-recaser. Maybe I'll write one too, unless someone does it
> before! :-)
>
> Jehan
>
> On Fri, Nov 25, 2011 at 10:12 PM, Daniel Schaut <[email protected]> 
> wrote:
>> Hi Jehan,
>>
>> That's a nice idea and thanks for the trick. :) I thought the lm switch
>> could only be used in connection with train-model. Apologies for the lack of
>> knowledge. ;)
>> So, all switches found in the reference
>> http://www.statmt.org/moses/?n=FactoredTraining.TrainingParameters
>> can be called with train-recaser, too? If yes, this could be mentioned in
>> the manual by dropping a line.
>>
>> Would be nice to add a help switch for train-recaser, too.
>>
>> Daniel
>>
>> -----Ursprüngliche Nachricht-----
>> Von: [email protected] [mailto:[email protected]] Im
>> Auftrag von Jehan Pages
>> Gesendet: Freitag, 25. November 2011 03:54
>> An: <[email protected]>
>> Betreff: Re: [Moses-support] Train recasing model using IRSTLM
>>
>> Hi all,
>>
>> rather than having to search through email archive, as I guess we are not
>> the only one who won't use SRILM because it is proprietary (or some other
>> reason), I thought the best would be to modify the existing script to be
>> able to switch to IRSTLM when desired. I have just made a pull request on
>> the Moses repository for updating this train-recaser.perl script.
>>
>> Description:
>> Note that by default, the script will still use SRILM, which prevent from
>> breakage any existing script calling the current version of
>> train-recaser.perl.
>> To use IRSTLM instead of SRILM, only adding "-lm irstlm" on the command line
>> is enough.
>> In case build-lm.sh is not in $PATH, there is also a new option -build-lm
>> which allows one to specify the given path of the script to use (with
>> build-lm.sh command line syntax).
>>
>> I think this should be better in long term. :-)
>>
>> Jehan
>>
>> On Sun, Nov 13, 2011 at 12:58 AM, Daniel Schaut <[email protected]>
>> wrote:
>>> Dear all,
>>>
>>>
>>>
>>> I’m having some difficulties to train the recasing model with IRSTLM.
>>> I changed the train-recaser script according to
>>>
>>> http://www.mail-archive.com/[email protected]/msg01934.html
>>>
>>> but this results in an error which I don’t know how to fix.
>>>
>>>
>>>
>>> Error log:
>>>
>>> ----------------------------------------------------------------------
>>> -
>>>
>>> (4) Training recasing model @ Sat Nov 12 14:49:06 CET 2011
>>>
>>> /home/user/mosestools/scripts-20111024-1127/training/train-model.perl
>>> --root-dir /home/user/moses/work/recaser --model-dir
>>> /home/user/moses/work/recaser --first-step 4 --alignment a --corpus
>>> /home/user/moses/work/recaser/aligned --f lowercased --e cased
>>> --max-phrase-length 1 --lm
>>> 0:3:/home/user/moses/work/recaser/cased.irstlm.gz:1 -scripts-root-dir
>>> /home/user/moses/mosestools/scripts-20111024-1127
>>>
>>> Can't exec
>>> "/home/user/mosestools/scripts-20111024-1127/training/train-model.perl
>>> ": No such file or directory at ./train-recaser.perl line 95.
>>>
>>>
>>>
>>> (11) Cleaning up @ Sat Nov 12 14:49:06 CET 2011
>>>
>>> ----------------------------------------------------------------------
>>> -
>>>
>>>
>>>
>>> Then instead of using build-lm.sh, I gave it another try calling
>>> compile-lm
>>> directly:
>>>
>>> my $cmd = "/home/user/moses/mosestools/irstlm-5.60.03/bin/compile-lm
>>> $CORPUS /dev/stdout | gzip -c > $DIR/cased.irstlm.gz
>>>
>>> where $CORPUS is a gzip iARPA file.
>>>
>>>
>>>
>>> Error log:
>>>
>>> ----------------------------------------------------------------------
>>> -
>>>
>>> (3) Preparing data for training recasing model @ Sat Nov 12 15:11:26
>>> CET
>>> 2011
>>>
>>> /home/nexoc/moses/work/recaser/aligned.lowercased
>>>
>>> utf8 "\x8B" does not map to Unicode at ./train-recaser.perl line 64,
>>> <CORPUS> line 1.
>>>
>>> Malformed UTF-8 character (fatal) at ./train-recaser.perl line 70,
>>> <CORPUS> line 1.
>>>
>>> ----------------------------------------------------------------------
>>> -
>>>
>>>
>>>
>>> Please see full error logs attached for more information.
>>>
>>>
>>>
>>> Could anyone give me a hint on how to train a recasing model with
>>> either build-lm.sh or compile-lm? Help is very much appreciated.
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Daniel
>>>
>>>
>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> [email protected]
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> [email protected]
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Train recasing model using IRSTLM

Reply via email to