Hi,

        It looks like your training data isn't valid UTF8.  Either convert it
to UTF8 with iconv or scrub the invalid data first.

Kenneth

On 11/12/11 15:58, Daniel Schaut wrote:
> Dear all,
> 
>  
> 
> I’m having some difficulties to train the recasing model with IRSTLM. I
> changed the train-recaser script according to
> 
> http://www.mail-archive.com/[email protected]/msg01934.html
> 
> but this results in an error which I don’t know how to fix.
> 
>  
> 
> Error log:
> 
> -----------------------------------------------------------------------
> 
> (4) Training recasing model @ Sat Nov 12 14:49:06 CET 2011
> 
> /home/user/mosestools/scripts-20111024-1127/training/train-model.perl
> --root-dir /home/user/moses/work/recaser --model-dir
> /home/user/moses/work/recaser --first-step 4 --alignment a --corpus
> /home/user/moses/work/recaser/aligned --f lowercased --e cased
> --max-phrase-length 1 --lm
> 0:3:/home/user/moses/work/recaser/cased.irstlm.gz:1 -scripts-root-dir
> /home/user/moses/mosestools/scripts-20111024-1127
> 
> Can't exec
> "/home/user/mosestools/scripts-20111024-1127/training/train-model.perl":
> No such file or directory at ./train-recaser.perl line 95.
> 
>  
> 
> (11) Cleaning up @ Sat Nov 12 14:49:06 CET 2011
> 
> -----------------------------------------------------------------------
> 
>  
> 
> Then instead of using build-lm.sh, I gave it another try calling
> compile-lm directly:
> 
> my $cmd = "/home/user/moses/mosestools/irstlm-5.60.03/bin/compile-lm
> $CORPUS /dev/stdout | gzip -c > $DIR/cased.irstlm.gz
> 
> where $CORPUS is a gzip iARPA file.
> 
>  
> 
> Error log:
> 
> -----------------------------------------------------------------------
> 
> (3) Preparing data for training recasing model @ Sat Nov 12 15:11:26 CET
> 2011
> 
> /home/nexoc/moses/work/recaser/aligned.lowercased
> 
> utf8 "\x8B" does not map to Unicode at ./train-recaser.perl line 64,
> <CORPUS> line 1.
> 
> Malformed UTF-8 character (fatal) at ./train-recaser.perl line 70,
> <CORPUS> line 1.
> 
> -----------------------------------------------------------------------
> 
>  
> 
> Please see full error logs attached for more information.
> 
>  
> 
> Could anyone give me a hint on how to train a recasing model with either
> build-lm.sh or compile-lm? Help is very much appreciated.
> 
>  
> 
> Thanks,
> 
> Daniel
> 
>  
> 
> 
> 
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to