[Moses-support] Regarding cleaning corpus

Aqlan fares Thu, 17 May 2018 11:37:50 -0700

Hi,
I am working on Arabic MT, using different tokenization schemes.
The different schemes result in different line lengths, which might cause
imbalances among the different options when I clean corpus, to eliminate
the lines beyond the length of 85 words.
In order to avoid this imbalance, let's say that I have 4 scheme ( A, B,
C,D), and I need to eliminate the lines across all files whose B scheme
exceeds 85 words.
How can I do that using clean-corpus-n.perl ?


Thanks for any help you may offer.

Best Regards.

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

[Moses-support] Regarding cleaning corpus

Reply via email to