Hi Huy.

Moses uses the '|' character as delimitation character between  
factors. This means that, if your corpus contains a '|' character, the  
Moses scripts will think it is such a delimiter, and hence the error.

In order to avoid this problem you should replace all occurrences of  
this character by something else, e.g.:

sed 's/|/_V_BAR_/g' < corpus > newcorpus

However, remember to perform this operation on all the corpus subsets  
(i.e. training, development, devtest and test) and also on the set you  
used to train your language model.

Good luck,

Germán Sanchis

Quoting Nguyen Tien Huy <[EMAIL PROTECTED]>:

> Dear Moses-support
>
> I saw message "There is a blank factor in   
> working-dir/corpus/europarl.tok.en..." after I run   
> "bin/moses-scripts/scripts-YYYYMMDD-HHMM/training/clean-corpus-n.perl   
> working-dir/corpus/europarl.tok fr en   
> working-dir/corpus/europarl.clean 1 40" command.
>
> What is message mean? Is europarl.tok.en file error or something?
> Can I run "Lowercase training data"?
>
> Thanks in advance.
> Huy.
>
> _________________________________________________________________
> Connect to the next generation of MSN Messenger  
> http://imagine-msn.com/messenger/launch80/default.aspx?locale=en-us&source=wlmailtagline



----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.



_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to