J C Read a écrit :
According to wikipedia http://en.wikipedia.org/wiki/SIGSEGV signal 11 indicates
an invalid memory reference.
Yes, definitely, what we also call a "coredump" under AIX.
I eventually figured out that this was because of the data I was using.
That's often the case, an unfortunate data condition that is unexpected and unaccounted for in error recovery. That's usually hard to track, though...
Things to check:

Is the data sentence aligned?
Yes, europarl.lowercased.0-0.fr has 73835 lines:
   reprise de la session
   je déclare reprise la session du parlement européen qui avait (...)
   (...)
   des paroles , pas d' action .
   en attendant , deux mille personnes ont perdu la vie inutilement , (...)
and europarl.lowercased.0-0.en has 73835 lines:
   resumption of the session
i declare resumed the session of the european parliament adjourned on (...)
   (...)
   more talk . no action .
   meanwhile , two thousand people in the last year have needlessly (...)
Has the data been cleaned with the clean script? (try using sentences of min 1
and max 100)
Yes, it went through the script, with the recommended parameters:

| bin/moses-scripts/scripts-||/YYYYMMDD-HHMM/||/training/clean-corpus-n.perl working-dir/corpus/europarl.tok fr en working-dir/corpus/europarl.clean 1 40|

which reduced the number of sentences from the initial 100K to 73835.

Any other suggestions?

Say, it could not be that the very smallness of my training data (only 73K sentences) could be causing unexpected underflows or whatever in GIZA, could it? Does it not make sense to try and run the whole process on a small dataset to start with (I don't have access to powerful machines at the moment, running this on my personal laptop...) ?

Thanks for your support, much appreciated.

--
Hubert Crépy

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to