It's complaining about a long sentence with over 1000 characters, rather than words. Is there possibly a very, very long word?
On 27 January 2011 02:33, Joachim Van den Bogaert <[email protected]>wrote: > Hi everyone, > > I encountered a problem during training in step 3 (Align words) with the > following message: > > Using SCRIPTS_ROOTDIR: > /opt/moses-tools/moses-scripts/scripts-20100707-1101/ > Using single-thread GIZA > (3) generate word alignment @ Wed Jan 26 19:12:08 UTC 2011 > Combining forward and inverted alignment from files: > /mnt/data//giza.es-en/es-en.A3.final.{bz2,gz} > /mnt/data//giza.en-es/en-es.A3.final.{bz2,gz} > Executing: mkdir -p /mnt/data//model > Executing: > > /opt/moses-tools/moses-scripts/scripts-20100707-1101//training/symal/giza2ba > l.pl -d "gzip -cd /mnt/data//giza.en-es/e > gzip -cd /mnt/data//giza.es-en/es-en.A3.final.gz" > |/opt/moses-tools/moses-scripts/scripts-20100707-1101//training/symal/symal > -a > nal="yes" -final="yes" -both="yes" > > /mnt/data//model/aligned.grow-diag-final-and > symal: computing grow alignment: diagonal (1) final (1)both-uncovered (1) > 1500066: target len=999 is not less than MAX_WORD-1=999 > symal: symal.cpp:83: int getals(std::fstream&, int&, int*, int&, int*): > Assertion `strlen(w)<1000-1' failed. > Aborted > Exit code: 134 > ERROR: Can't generate symmetrized alignment file > > Has anyone encountered this message before? > And does anyone have a clue what it means? > > I cleaned the corpus to contain sentences with only 0-50 tokens. > I checked this after the cleaning procedure, so isn't it strange that I get > the message: > > target len=999 is not less than MAX_WORD-1=999 > > Thanks, > Joachim > > > > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > >
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
