Hi,

the script clean-corpus-n.perl takes care of too long sentences
and when the ratio of tokens between the two sentences is
out of whack (a GIZA++ requirement).

-phi

On Fri, Jan 27, 2012 at 10:56 PM, Taylor Rose <
[email protected]> wrote:

> Hey all,
>
> When I'm training I'm getting errors after the alignment code runs. I
> think the alignment itself is failing. I am creating the tm myself. Is
> there anything complex I need to consider? I wrote a script that
> tokenizes and escapes dangerous characters. I also cut off segments that
> are too long. What is this Giza ratio business? How does that tie in?
>
> Thanks,
> --
> Taylor Rose
> Machine Translation Intern
> Language Intelligence
> IRC: Handle: trose
>     Server: freenode
>
>
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to