You're right about the null byte, and given the time I've been spending on this 
training I'm definitely interested in any shortcut that would avoid my having 
to start from scratch!

The data I'm training on is not Chinese UN data but a pretty large dump of 
Microsoft software strings in English and French.

Thanks a bunch,
Mirko

-----Original Message-----
From: John Burger [mailto:[email protected]] 
Sent: Tuesday, May 12, 2009 2:04 PM
To: Mirko Plitt
Cc: [email protected]
Subject: Re: [Moses-support] PhraseScore dies with signal 11

Mirko Plitt wrote:

> Loading lexical translation table from ./model/lex.f2eline 2 in ./ 
> model/lex.f2e
> has wrong number of tokens, skipping:
> 0 ERROR: Execution of: /usr/bin/training/phrase-extract/score ./ 
> model/extract.so
> rted ./model/lex.f2e ./model/phrase-table.half.f2e
>   died with signal 11, without coredump

In my experience this means you have a null byte in your data.  Did  
you look at line 2 of model/lex.f2e?  I suspect you will find what  
looks like garbage, depending on what you view it with.

Try this to find lines with null bytes in your original data:

   grep -Pc '[\000]' <files ...>

(If your grep doesn't support Perl -style regepx syntax (-P), you'll  
have to express that a different way.)

If this turns out to be the problem, and you don't want to run GIZA  
again from scratch, let me know and I can tell you how I've hacked up  
the files in ./model/ to restart the Moses training script from step 5.

By the way, do you happen to be using the Chinese UN data?  I've found  
that two years of this data are pretty screwed up, including null  
bytes.  These files obviously got corrupted at some point.  I find the  
UN data to be very frustrating, since it's odd and messy in many  
different ways.  But such large portions!

- John Burger
   MITRE

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to