[Apertium-stuff] Cleaning Parallel Corpus

VIVEK VICKY Wed, 28 Apr 2021 11:49:29 -0700

Hello everyone,
The eng-spa parallel corpora I am using(http://www.statmt.org/europarl/,
http://www.statmt.org/wmt13/training-parallel-nc-v8.tgz), have empty lines
in either languages due to splitting of a sentence into two or merging of
two sentences after the translation, which is causing errors during
lexical-training. Is it common in parallel corpora? or is there any clean
parallel corpus out there?
Right now, I am translating the sentences around[up and below] the empty
lines and manually merging/splitting them. Is there any better way to do
this?
Regards,
Vivek Vardhan Adepu
IRC: vivekvelda*/naan_dhaan*

_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

[Apertium-stuff] Cleaning Parallel Corpus

Reply via email to