Hi Barry, In out.job12017-aa,
Linux bunix-server 2.6.35-30-generic #60-Ubuntu SMP Mon Sep 19 20:45:08 UTC 2011 i686 GNU/Linux ulimit: Command not found. /home/guchun/Work/mosesdecoder/moses-cmd/src/moses: Exec format error. Wrong Architecture. Newline in variable name. bunix-server is the hostname of the execution node. Complaints are similar in out.job12017-ab (run on another node), too. Cheers, Guchun On 16 November 2011 09:21, Barry Haddow <[email protected]> wrote: > Hi Guchun > > The mert.out file doesn't help that much. Is there any more information in > the > err and out files? > eg > /home/guchun/Work/tasks/ro-en/tuning-sge/out.job12017-aa > /home/guchun/Work/tasks/ro-en/tuning-sge/err.job12017-aa > > cheers - Barry > > On Tuesday 15 Nov 2011 22:01:41 Guchun Zhang wrote: > > Hi there, > > > > I am trying to tune on a SGE cluster. I ran the following command on the > > head node, > > > > > /home/guchun/Work/moses-scripts/scripts-20111111-1703/training/mert-moses.p > > l \ > > /home/guchun/Work/tasks/ro-en/corpus/euparl.lc.ro \ > > /home/guchun/Work/tasks/ro-en/corpus/euparl.lc.en \ > > /home/guchun/Work/mosesdecoder/moses-cmd/src/moses \ > > /home/guchun/Work/tasks/ro-en/trained/model/moses.ini \ > > --mertdir /home/guchun/Work/mosesdecoder/mert/ \ > > --rootdir /home/guchun/Work/moses-scripts/scripts-20111111-1703/ \ > > --working-dir /home/guchun/Work/tasks/ro-en/tuning-sge/ \ > > --jobs 2 --decoder-flag "-v 0" >& > > /home/guchun/Work/tasks/ro-en/tuning-sge/mert.out & > > > > I got the following error, > > > > check_exit_status > > check_exit_status of job -aa > > check_exit_status of job -ab > > *wc: euparl.lc.ro.split12017-aa.trans: No such file or directory* > > *Split (-aa) were not entirely translated* > > outputN= inputN=11966 > > outputfile=euparl.lc.ro.split12017-aa.trans > > inputfile=euparl.lc.ro.split12017-aa > > *Split (-ab) were not entirely translated* > > outputN=0 inputN=11966 > > outputfile=euparl.lc.ro.split12017-ab.trans > > inputfile=euparl.lc.ro.split12017-ab > > *everything crashed, not trying to resubmit jobs* > > *Got interrupt or something failed.* > > kill_all_and_quit > > qdel 56 > > Executing: qdel 56 > > Exit code: 1 > > qdel 57 > > Executing: qdel 57 > > Exit code: 1 > > Translation was not performed correctly > > or some of the submitted jobs died. > > qdel function was called for all submitted jobs > > Exit code: 1 > > The decoder died. CONFIG WAS -w -0.322581 -lm 0.161290 -d 0.193548 -tm > > 0.064516 0.064516 0.064516 0.064516 0.064516 > > > > Any clue what may cause the problem? I have also attached the output file > > (mert.out) for full inspection. > > > > Everything runs fine in serial execution (without --job 2). > > > > I wonder if this can attribute to my SGE configuration. So if possible, > > could you please also give some advice on the parameter configuration of > > SGE? > > > > Many thanks in advance, > > > > Guchun > > > > -- *Guchun Zhang* Localization Engineer Alpha CRC Ltd | Cambridge, UK Direct: +44 1223 431035 [email protected] <[email protected]>
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
