Hi, simply combine all the files for each language into one file: % cat de-en/de/* > corpus.de-en.de % cat de-en/en/* > corpus.de-en.en
-phi On 12/27/07, Pradeep Muthukrishnan <[EMAIL PROTECTED]> wrote: > Hello, > > I got the sentence-align-corpus to work properly, but every other script > that needs to be run like clean-corpus-n.perl needs just two files, the > source language file and the target language file. But after > sentence-align-corpus I have a lot of German files in > /data0/tools/mosesdecoder/scripts/training/europarl/aligned/de-en/de. > The files are named like > ep-00-09-07.txt, etc. > > Similarly all the English files are in > /data0/tools/mosesdecoder/scripts/training/europarl/aligned/de-en/de. > The files are named like > ep-00-09-07.txt, etc. > > How do I merge all these files into corpus.de and corpus.en? > > Someone please help me with this. I have been working on this for quite some > time now. Thanks for your time! > > regards, > Pradeep > > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > > _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
