Hi guys, Thanks for the input. Next time, I will look in the source first before posting :-). Turns out there were some garbage tokens in my files. I am going to give the training another try. I'll let you know the outcome.
Regards, Joachim -----Oorspronkelijk bericht----- Van: Suzy Howlett [mailto:[email protected]] Verzonden: woensdag 26 januari 2011 23:11 Aan: [email protected] CC: Joachim Van den Bogaert; [email protected] Onderwerp: Re: [Moses-support] Moses alignment problem Well, when I look in symal.cpp (where the assertion was), I see the assertion is assert(strlen(w)<MAX_WORD-1); and up the top it says #define MAX_WORD 10000 // maximum lengthsource/target strings so it does sound like your target sentence is longer than it was expecting, although the limit in my version is 10,000 instead of 1,000. I have no idea why your limit would be different, and I don't really know anything about this part of the translation process, so what I'm about to suggest may be a really bad idea. What happens if you change the limit to 10,000? Best, Suzy On 27/01/11 8:52 AM, Hieu Hoang wrote: > sorry it's crashed during symmetrising alignment so that has nothing to > do with words size. So I'm not sure why its crashed. > > On 27 January 2011 04:32, Hieu Hoang <[email protected] > <mailto:[email protected]>> wrote: > > It's complaining about a long sentence with over 1000 characters, > rather than words. Is there possibly a very, very long word? > > On 27 January 2011 02:33, Joachim Van den Bogaert > <[email protected] <mailto:[email protected]>> wrote: > > Hi everyone, > > I encountered a problem during training in step 3 (Align words) > with the > following message: > > Using SCRIPTS_ROOTDIR: > /opt/moses-tools/moses-scripts/scripts-20100707-1101/ > Using single-thread GIZA > (3) generate word alignment @ Wed Jan 26 19:12:08 UTC 2011 > Combining forward and inverted alignment from files: > /mnt/data//giza.es-en/es-en.A3.final.{bz2,gz} > /mnt/data//giza.en-es/en-es.A3.final.{bz2,gz} > Executing: mkdir -p /mnt/data//model > Executing: > /opt/moses-tools/moses-scripts/scripts-20100707-1101//training/symal/giza2ba > l.pl <http://l.pl> -d "gzip -cd /mnt/data//giza.en-es/e > gzip -cd /mnt/data//giza.es-en/es-en.A3.final.gz" > |/opt/moses-tools/moses-scripts/scripts-20100707-1101//training/symal/symal > -a > nal="yes" -final="yes" -both="yes" > > /mnt/data//model/aligned.grow-diag-final-and > symal: computing grow alignment: diagonal (1) final > (1)both-uncovered (1) > 1500066: target len=999 is not less than MAX_WORD-1=999 > symal: symal.cpp:83: int getals(std::fstream&, int&, int*, int&, > int*): > Assertion `strlen(w)<1000-1' failed. > Aborted > Exit code: 134 > ERROR: Can't generate symmetrized alignment file > > Has anyone encountered this message before? > And does anyone have a clue what it means? > > I cleaned the corpus to contain sentences with only 0-50 tokens. > I checked this after the cleaning procedure, so isn't it strange > that I get > the message: > > target len=999 is not less than MAX_WORD-1=999 > > Thanks, > Joachim > > > > > _______________________________________________ > Moses-support mailing list > [email protected] <mailto:[email protected]> > http://mailman.mit.edu/mailman/listinfo/moses-support > > > > > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support -- Suzy Howlett http://www.showlett.id.au/ _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
