Mirko Plitt wrote: > Loading lexical translation table from ./model/lex.f2eline 2 in ./ > model/lex.f2e > has wrong number of tokens, skipping: > 0 ERROR: Execution of: /usr/bin/training/phrase-extract/score ./ > model/extract.so > rted ./model/lex.f2e ./model/phrase-table.half.f2e > died with signal 11, without coredump
In my experience this means you have a null byte in your data. Did you look at line 2 of model/lex.f2e? I suspect you will find what looks like garbage, depending on what you view it with. Try this to find lines with null bytes in your original data: grep -Pc '[\000]' <files ...> (If your grep doesn't support Perl -style regepx syntax (-P), you'll have to express that a different way.) If this turns out to be the problem, and you don't want to run GIZA again from scratch, let me know and I can tell you how I've hacked up the files in ./model/ to restart the Moses training script from step 5. By the way, do you happen to be using the Chinese UN data? I've found that two years of this data are pretty screwed up, including null bytes. These files obviously got corrupted at some point. I find the UN data to be very frustrating, since it's odd and messy in many different ways. But such large portions! - John Burger MITRE _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
