Hi, the script clean-corpus-n.perl takes care of too long sentences and when the ratio of tokens between the two sentences is out of whack (a GIZA++ requirement).
-phi On Fri, Jan 27, 2012 at 10:56 PM, Taylor Rose < [email protected]> wrote: > Hey all, > > When I'm training I'm getting errors after the alignment code runs. I > think the alignment itself is failing. I am creating the tm myself. Is > there anything complex I need to consider? I wrote a script that > tokenizes and escapes dangerous characters. I also cut off segments that > are too long. What is this Giza ratio business? How does that tie in? > > Thanks, > -- > Taylor Rose > Machine Translation Intern > Language Intelligence > IRC: Handle: trose > Server: freenode > > > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support >
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
