Hi Nakul The clean corpus script has to process *both* source and target at the same time. The corpus should consist of two files named like corpus.hi and corpus.en
Referring to this guide should help http://www.statmt.org/moses_steps.html best regards - Barry On Thursday 03 February 2011 03:21, nakul sharma wrote: > Hi all, > > > > i have installed latest version of moses from sourceforge.net. > > i am just clarifying, do we need to place the corpus of both the languages > (both source and target) as input for clean-corpus-n.perl ? i executed > script for both these lang and got following messages:- > > For Source:- > > ./clean-corpus-n.perl 200EnglishSens en hi 200EnglishSens.clean 1 50 > clean-corpus.perl: processing 200EnglishSens.en & .hi to > 200EnglishSens.clean, cutoff 1-50 > > Input sentences: 203 Output sentences: 187 > > For Target :- > > ./clean-corpus-n.perl 200HindiSens hi en 200HindiSens.clean 1 50 > clean-corpus.perl: processing 200HindiSens.hi & .en to 200HindiSens.clean, > cutoff 1-50 > Use of uninitialized value $opn in open at ./clean-corpus-n.perl line 46. > Use of uninitialized value $opn in concatenation (.) or string at > ./clean-corpus-n.perl line 46. > Can't open '' at ./clean-corpus-n.perl line 46 > > So the problem is again seems to be with the target lang. How to solve this > problem of broken UTF as it was pointed out Tom. -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
