Hi,

I think that means you have a word (or factor) that is longer than 1000
characters on line 20636 of your corpus.  symal.cpp has a hardcoded
word-length limit of 1000 characters.

If you run your corpus through the following script before calling the
training script, it should get rid of any segment containing a word that is
too long:

  scripts/training/clean-corpus-n.perl

Regards,
Ben

---------- Forwarded message ----------
> From: 蒋乾 <[email protected]>
> To: [email protected]
> Date: Sat, 16 Jul 2011 09:27:55 +0800
> Subject: [Moses-support] Can't generate symmetrized alignment file
> Hi,
>
> When I trained the translation from English to Vietnamese, a mistake I met
> was showed as follows:
>
> "|tools/moses-scripts/scripts-20110322-0943//training/symal/symal
> -alignment="grow" -diagonal="yes" -final="yes" -both="yes" >
> jameswork/En_Vi//model/aligned.grow-diag-final-and
>
> symal: computing grow alignment: diagonal (1) final (1)both-uncovered (1)
>
> 20636: target len=999 is not less than MAX_WORD-1=999
>
> symal: symal.cpp:83: int getals(std::fstream&, int&, int*, int&, int*):
> Assertion `strlen(w)<1000-1' failed.
>
> sh: line 1: 16708 Broken pipe
> tools/moses-scripts/scripts-20110322-0943//training/symal/giza2bal.pl -d
> "gzip -cd jameswork/En_Vi//giza.vi-en/vi-en.A3.final.gz" -i "gzip -cd
> jameswork/En_Vi//giza.en-vi/en-vi.A3.final.gz"
>
>      16709 Aborted                 (core dumped) |
> tools/moses-scripts/scripts-20110322-0943//training/symal/symal
> -alignment="grow" -diagonal="yes" -final="yes" -both="yes" >
> jameswork/En_Vi//model/aligned.grow-diag-final-and
>
> Exit code: 134
> ERROR: Can't generate symmetrized alignment file"
>
> What is the reason for its happening? And how can I solve it?
>
> Thank you.
>
> ps: When I used a part of the paralleled-corpus, the alert was avoided.
>
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to