Hi Huy. Moses uses the '|' character as delimitation character between factors. This means that, if your corpus contains a '|' character, the Moses scripts will think it is such a delimiter, and hence the error.
In order to avoid this problem you should replace all occurrences of this character by something else, e.g.: sed 's/|/_V_BAR_/g' < corpus > newcorpus However, remember to perform this operation on all the corpus subsets (i.e. training, development, devtest and test) and also on the set you used to train your language model. Good luck, Germán Sanchis Quoting Nguyen Tien Huy <[EMAIL PROTECTED]>: > Dear Moses-support > > I saw message "There is a blank factor in > working-dir/corpus/europarl.tok.en..." after I run > "bin/moses-scripts/scripts-YYYYMMDD-HHMM/training/clean-corpus-n.perl > working-dir/corpus/europarl.tok fr en > working-dir/corpus/europarl.clean 1 40" command. > > What is message mean? Is europarl.tok.en file error or something? > Can I run "Lowercase training data"? > > Thanks in advance. > Huy. > > _________________________________________________________________ > Connect to the next generation of MSN Messenger > http://imagine-msn.com/messenger/launch80/default.aspx?locale=en-us&source=wlmailtagline ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program. _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
