Hi all, A number of my jobs keep dying during MERT, and I'm having trouble tracking down what's going on. I submit all of my jobs using SGE, so it's possible there's an interaction there.
Can anyone help me understand what's going on below: sh: line 1: 29188 Killed /free/lane/slm-merging-trunk/moses-cmd/src/moses -config /scratch4/lane/2011-12-15_europarl/config/de-en/filtered/filtered.ttable20.dist05.synlm50.ini -inputtype 0 -w -0.178571 -slm 0.178571 -lm 0.089286 -d 0.053571 0.053571 0.053571 0.053571 0.053571 0.053571 0.053571 -tm 0.035714 0.035714 0.035714 0.035714 0.035714 -n-best-list run1.best100.out 100 -input-file /scratch4/lane/2011-12-15_europarl/corpus/dev.tok.norm.de > run1.out Exit code: 137 The decoder died. CONFIG WAS -w -0.178571 -slm 0.178571 -lm 0.089286 -d 0.053571 0.053571 0.053571 0.053571 0.053571 0.053571 0.053571 -tm 0.035714 0.035714 0.035714 0.035714 0.035714 I've searched for the meaning of exit code 137, and what I've read says that's the exit code for a process that received kill signal 9. I'm especially puzzled by "sh: line 1: 29188 Killed". I'm pretty sure that the safesystem function in the moses-mert.pl script is printing "Exit code: 137", and I'm assuming that the moses command itself is being launched by the "system(@_)" command within that same safesystem function. But I don't know what is responsible for printing "sh: line 1: 29188 Killed", or what "line 1" and "29188" refer to. For what it's worth, I'm attaching the results of running qacct -j on the job after it died. I don't think it is relevant, but I guess it could be. Thanks, Lane
============================================================== qname all.q hostname quad19.scream.lab group scream owner lane project NONE department defaultdepartment jobname de-en.mert jobnumber 20337 taskid undefined account sge priority 0 qsub_time Mon Feb 13 14:08:54 2012 start_time Mon Feb 13 14:09:05 2012 end_time Wed Feb 15 14:54:52 2012 granted_pe NONE slots 1 failed 0 exit_status 2 ru_wallclock 175547 ru_utime 175460.360 ru_stime 21.147 ru_maxrss 23910412 ru_ixrss 0 ru_ismrss 0 ru_idrss 0 ru_isrss 0 ru_minflt 6545996 ru_majflt 7568 ru_nswap 0 ru_inblock 3067192 ru_oublock 22064 ru_msgsnd 0 ru_msgrcv 0 ru_nsignals 0 ru_nvcsw 9545 ru_nivcsw 256918 cpu 175481.507 mem 2516411.448 io 4.733 iow 0.000 maxvmem 25.026G arid undefined
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
