Re: [Moses-support] English -> Russian Parallel Corpora

Ben Houghton Tue, 19 Apr 2011 03:00:39 -0700

Hi guys,


Thanks for the suggestions. I have downloaded the EuroMatrixPlus corpora
and extracted English and Russian to the text folder using the
extract.py. Initially I just took all the files where the line numbers
matched but that only gives me a corpus of around 500,000 lines. I
noticed most of the files don't have matching lines numbers and many
contain text that their counterpart does not contain e.g.
A_53_647_CORR1_(en)(ru)

 

What would be the steps involved in getting these files strictly
aligned?

 

Thanks

 

Ben

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] English -> Russian Parallel Corpora

Reply via email to