Hi all, rather than having to search through email archive, as I guess we are not the only one who won't use SRILM because it is proprietary (or some other reason), I thought the best would be to modify the existing script to be able to switch to IRSTLM when desired. I have just made a pull request on the Moses repository for updating this train-recaser.perl script.
Description: Note that by default, the script will still use SRILM, which prevent from breakage any existing script calling the current version of train-recaser.perl. To use IRSTLM instead of SRILM, only adding "-lm irstlm" on the command line is enough. In case build-lm.sh is not in $PATH, there is also a new option -build-lm which allows one to specify the given path of the script to use (with build-lm.sh command line syntax). I think this should be better in long term. :-) Jehan On Sun, Nov 13, 2011 at 12:58 AM, Daniel Schaut <[email protected]> wrote: > Dear all, > > > > I’m having some difficulties to train the recasing model with IRSTLM. I > changed the train-recaser script according to > > http://www.mail-archive.com/[email protected]/msg01934.html > > but this results in an error which I don’t know how to fix. > > > > Error log: > > ----------------------------------------------------------------------- > > (4) Training recasing model @ Sat Nov 12 14:49:06 CET 2011 > > /home/user/mosestools/scripts-20111024-1127/training/train-model.perl > --root-dir /home/user/moses/work/recaser --model-dir > /home/user/moses/work/recaser --first-step 4 --alignment a --corpus > /home/user/moses/work/recaser/aligned --f lowercased --e cased > --max-phrase-length 1 --lm > 0:3:/home/user/moses/work/recaser/cased.irstlm.gz:1 -scripts-root-dir > /home/user/moses/mosestools/scripts-20111024-1127 > > Can't exec > "/home/user/mosestools/scripts-20111024-1127/training/train-model.perl": No > such file or directory at ./train-recaser.perl line 95. > > > > (11) Cleaning up @ Sat Nov 12 14:49:06 CET 2011 > > ----------------------------------------------------------------------- > > > > Then instead of using build-lm.sh, I gave it another try calling compile-lm > directly: > > my $cmd = "/home/user/moses/mosestools/irstlm-5.60.03/bin/compile-lm $CORPUS > /dev/stdout | gzip -c > $DIR/cased.irstlm.gz > > where $CORPUS is a gzip iARPA file. > > > > Error log: > > ----------------------------------------------------------------------- > > (3) Preparing data for training recasing model @ Sat Nov 12 15:11:26 CET > 2011 > > /home/nexoc/moses/work/recaser/aligned.lowercased > > utf8 "\x8B" does not map to Unicode at ./train-recaser.perl line 64, > <CORPUS> line 1. > > Malformed UTF-8 character (fatal) at ./train-recaser.perl line 70, <CORPUS> > line 1. > > ----------------------------------------------------------------------- > > > > Please see full error logs attached for more information. > > > > Could anyone give me a hint on how to train a recasing model with either > build-lm.sh or compile-lm? Help is very much appreciated. > > > > Thanks, > > Daniel > > > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > > _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
