Hi, I think that means you have a word (or factor) that is longer than 1000 characters on line 20636 of your corpus. symal.cpp has a hardcoded word-length limit of 1000 characters.
If you run your corpus through the following script before calling the training script, it should get rid of any segment containing a word that is too long: scripts/training/clean-corpus-n.perl Regards, Ben ---------- Forwarded message ---------- > From: 蒋乾 <[email protected]> > To: [email protected] > Date: Sat, 16 Jul 2011 09:27:55 +0800 > Subject: [Moses-support] Can't generate symmetrized alignment file > Hi, > > When I trained the translation from English to Vietnamese, a mistake I met > was showed as follows: > > "|tools/moses-scripts/scripts-20110322-0943//training/symal/symal > -alignment="grow" -diagonal="yes" -final="yes" -both="yes" > > jameswork/En_Vi//model/aligned.grow-diag-final-and > > symal: computing grow alignment: diagonal (1) final (1)both-uncovered (1) > > 20636: target len=999 is not less than MAX_WORD-1=999 > > symal: symal.cpp:83: int getals(std::fstream&, int&, int*, int&, int*): > Assertion `strlen(w)<1000-1' failed. > > sh: line 1: 16708 Broken pipe > tools/moses-scripts/scripts-20110322-0943//training/symal/giza2bal.pl -d > "gzip -cd jameswork/En_Vi//giza.vi-en/vi-en.A3.final.gz" -i "gzip -cd > jameswork/En_Vi//giza.en-vi/en-vi.A3.final.gz" > > 16709 Aborted (core dumped) | > tools/moses-scripts/scripts-20110322-0943//training/symal/symal > -alignment="grow" -diagonal="yes" -final="yes" -both="yes" > > jameswork/En_Vi//model/aligned.grow-diag-final-and > > Exit code: 134 > ERROR: Can't generate symmetrized alignment file" > > What is the reason for its happening? And how can I solve it? > > Thank you. > > ps: When I used a part of the paralleled-corpus, the alert was avoided. > >
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
