Hi Nakul

The clean corpus script has to process *both* source and target at the same 
time. The corpus should consist of two files named like corpus.hi and 
corpus.en

Referring to this guide should help
http://www.statmt.org/moses_steps.html

best regards - Barry

On Thursday 03 February 2011 03:21, nakul sharma wrote:
> Hi all,
>
>
>
> i have installed latest version of moses from sourceforge.net.
>
> i am just clarifying, do we need to place the corpus of both the languages
> (both source and target) as input for clean-corpus-n.perl ? i executed
> script for both these lang and got following messages:-
>
> For Source:-
>
> ./clean-corpus-n.perl 200EnglishSens en hi 200EnglishSens.clean 1 50
> clean-corpus.perl: processing 200EnglishSens.en & .hi to
> 200EnglishSens.clean, cutoff 1-50
>
> Input sentences: 203  Output sentences:  187
>
> For Target :-
>
> ./clean-corpus-n.perl 200HindiSens hi en 200HindiSens.clean 1 50
> clean-corpus.perl: processing 200HindiSens.hi & .en to 200HindiSens.clean,
> cutoff 1-50
> Use of uninitialized value $opn in open at ./clean-corpus-n.perl line 46.
> Use of uninitialized value $opn in concatenation (.) or string at
> ./clean-corpus-n.perl line 46.
> Can't open '' at ./clean-corpus-n.perl line 46
>
> So the problem is again seems to be with the target lang. How to solve this
> problem of broken UTF as it was pointed out Tom.

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to