Hi all,

rather than having to search through email archive, as I guess we are
not the only one who won't use SRILM because it is proprietary (or
some other reason), I thought the best would be to modify the existing
script to be able to switch to IRSTLM when desired. I have just made a
pull request on the Moses repository for updating this
train-recaser.perl script.

Description:
Note that by default, the script will still use SRILM, which prevent
from breakage any existing script calling the current version of
train-recaser.perl.
To use IRSTLM instead of SRILM, only adding "-lm irstlm" on the
command line is enough.
In case build-lm.sh is not in $PATH, there is also a new option
-build-lm which allows one to specify the given path of the script to
use (with build-lm.sh command line syntax).

I think this should be better in long term. :-)

Jehan

On Sun, Nov 13, 2011 at 12:58 AM, Daniel Schaut <[email protected]> wrote:
> Dear all,
>
>
>
> I’m having some difficulties to train the recasing model with IRSTLM. I
> changed the train-recaser script according to
>
> http://www.mail-archive.com/[email protected]/msg01934.html
>
> but this results in an error which I don’t know how to fix.
>
>
>
> Error log:
>
> -----------------------------------------------------------------------
>
> (4) Training recasing model @ Sat Nov 12 14:49:06 CET 2011
>
> /home/user/mosestools/scripts-20111024-1127/training/train-model.perl
> --root-dir /home/user/moses/work/recaser --model-dir
> /home/user/moses/work/recaser --first-step 4 --alignment a --corpus
> /home/user/moses/work/recaser/aligned --f lowercased --e cased
> --max-phrase-length 1 --lm
> 0:3:/home/user/moses/work/recaser/cased.irstlm.gz:1 -scripts-root-dir
> /home/user/moses/mosestools/scripts-20111024-1127
>
> Can't exec
> "/home/user/mosestools/scripts-20111024-1127/training/train-model.perl": No
> such file or directory at ./train-recaser.perl line 95.
>
>
>
> (11) Cleaning up @ Sat Nov 12 14:49:06 CET 2011
>
> -----------------------------------------------------------------------
>
>
>
> Then instead of using build-lm.sh, I gave it another try calling compile-lm
> directly:
>
> my $cmd = "/home/user/moses/mosestools/irstlm-5.60.03/bin/compile-lm $CORPUS
> /dev/stdout | gzip -c > $DIR/cased.irstlm.gz
>
> where $CORPUS is a gzip iARPA file.
>
>
>
> Error log:
>
> -----------------------------------------------------------------------
>
> (3) Preparing data for training recasing model @ Sat Nov 12 15:11:26 CET
> 2011
>
> /home/nexoc/moses/work/recaser/aligned.lowercased
>
> utf8 "\x8B" does not map to Unicode at ./train-recaser.perl line 64,
> <CORPUS> line 1.
>
> Malformed UTF-8 character (fatal) at ./train-recaser.perl line 70, <CORPUS>
> line 1.
>
> -----------------------------------------------------------------------
>
>
>
> Please see full error logs attached for more information.
>
>
>
> Could anyone give me a hint on how to train a recasing model with either
> build-lm.sh or compile-lm? Help is very much appreciated.
>
>
>
> Thanks,
>
> Daniel
>
>
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to