Re: [Moses-support] how to clean the UN corpus

Rajen Chatterjee Mon, 01 Dec 2014 09:07:20 -0800

If your parallel corpus is not sentence aligned then you may look at some
sentence aligner tool, which can extract parallel sentences with some
confidence.
For eg.Microsoft Bilingual Sentence Aligner
http://research.microsoft.com/en-us/downloads/aafd5dcf-4dcc-49b2-8a22-f7055113e656/



On Mon, Dec 1, 2014 at 4:56 PM, emna hkiri <[email protected]> wrote:

>
> Dear Friends thank you a lot for your help before and i hope that you will
> help me
> again
> i try to build an arabic-english  SMT with moses
> but in the training Giza do not do the alignment it is because the corpus
> UN ar-en is not well cleaned ; in fact this is the problem because they are
> not parallel ;they have not the same number of lines. i'm working with 2000
> directory (2000ar and 2000en). does  anyone worked with UN ar-en corpus???
> i want to ask how to make the same number of lines for ar-en in 2000 in
> order to pass the cleaning step
>
> thank you in advance i hope you will answer my question
>
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>


-- 
-Regards,
 Rajen Chatterjee.

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] how to clean the UN corpus

Reply via email to