Hi, I recently had a very similar error message (when binarizing the reordering-table, which also contained, in my case not inappropriate characters, but missing ones.
See, http://www.mail-archive.com/[email protected]/msg02869.html A more informative error message would certainly help to 'repair' the 'corrupt' files. Rico Sennrich wrote: > Hi all, > > I recently got this error message when trying to train a hierarchical > model in Moses: > > r...@rico-work:~ > $ > /home/rico/bin/moses-scripts/scripts-20100920-1324//training/phrase-extract > /score /home/rico/smtworkspace/SACde-fr141//model/extract.sorted > /home/rico/smtworkspace/SACde-fr141//model/lex.f2e > /home/rico/smtworkspace/SACde-fr141//model/rule-table.half.f2e --Hierarchical > Score v2.0 written by Philipp Koehn > scoring methods for extracted rules > processing hierarchical rules > Loading lexical translation table > from /home/rico/smtworkspace/SACde-fr141//model/lex.f2e......... > score: score.cpp:434: void > outputPhrasePair(std::vector<PhraseAlignment*, > std::allocator<PhraseAlignment*> >&, float): Assertion > `bestAlignment->alignedToT[ j ].size() == 1' failed. > Aborted > > After a bit of searching, I found that Moses doesn't like words in square > brackets very much. This is the line in extract.sorted that caused the crash: > > ( [X][X] [X] ||| , « [X][X] [...] [X] ||| 0-1 1-2 ||| 0.111111 > > I just committed a patch that returns a (hopefully) more informative error > message. Still, people need to make sure that the training texts do not > contain any words in square brackets. > > Does anyone have a better idea how to handle this? > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > -- Thomas Meyer E-Mail: [email protected] Web: www.idiap.ch _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
