Hi,

the corpus filtering script that you are using expects a parallel corpus in the
format of two files, with corresponding lines referring to parallel sentences.
Hence, they need to have the same number of lines.

You will get the quoted error message, if the two files have different number
of lines, which is not the right starting point for this process. This
may be bad
data, or you have to run a sentence aligner first.

-phi

On Tue, Jul 16, 2013 at 6:48 AM, Cyrine NASRI <[email protected]> wrote:
>
> Hello,
>
> I'm trying to filter out long sentences using clean-corpus-n.pl, it dies
> after a while saying "europarl.tok.fr is too short!"
>
> this what i do :
>
> clean-corpus-n.perl corpus.tok.low de en clean 1 50
>
> Could someone please tell me if there is something obvious that I'm missing?
> Regards,
>
> Cyrine
>
>
> --
> Cyrine NASRI
> Ph.D. Student in Computer Science
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to