If your parallel corpus is not sentence aligned then you may look at some sentence aligner tool, which can extract parallel sentences with some confidence. For eg.Microsoft Bilingual Sentence Aligner http://research.microsoft.com/en-us/downloads/aafd5dcf-4dcc-49b2-8a22-f7055113e656/
On Mon, Dec 1, 2014 at 4:56 PM, emna hkiri <[email protected]> wrote: > > Dear Friends thank you a lot for your help before and i hope that you will > help me > again > i try to build an arabic-english SMT with moses > but in the training Giza do not do the alignment it is because the corpus > UN ar-en is not well cleaned ; in fact this is the problem because they are > not parallel ;they have not the same number of lines. i'm working with 2000 > directory (2000ar and 2000en). does anyone worked with UN ar-en corpus??? > i want to ask how to make the same number of lines for ar-en in 2000 in > order to pass the cleaning step > > thank you in advance i hope you will answer my question > > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > > -- -Regards, Rajen Chatterjee.
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
