Thank you

Best regards

Cyrine


2013/7/17 Philipp Koehn <[email protected]>

> Hi,
>
> the corpus filtering script that you are using expects a parallel corpus
> in the
> format of two files, with corresponding lines referring to parallel
> sentences.
> Hence, they need to have the same number of lines.
>
> You will get the quoted error message, if the two files have different
> number
> of lines, which is not the right starting point for this process. This
> may be bad
> data, or you have to run a sentence aligner first.
>
> -phi
>
> On Tue, Jul 16, 2013 at 6:48 AM, Cyrine NASRI <[email protected]>
> wrote:
> >
> > Hello,
> >
> > I'm trying to filter out long sentences using clean-corpus-n.pl, it dies
> > after a while saying "europarl.tok.fr is too short!"
> >
> > this what i do :
> >
> > clean-corpus-n.perl corpus.tok.low de en clean 1 50
> >
> > Could someone please tell me if there is something obvious that I'm
> missing?
> > Regards,
> >
> > Cyrine
> >
> >
> > --
> > Cyrine NASRI
> > Ph.D. Student in Computer Science
> >
> > _______________________________________________
> > Moses-support mailing list
> > [email protected]
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >
>



-- 
*Cyrine NASRI
Ph.D. Student in Computer Science*
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to