On 12/06/13 05:10 PM, Lin wrote: > Hi, > > Yes, they are. When I run "top" or "ps". There are exactly 16 Ray ranks and > one mpiexec process in the oak machine.
But this is before one of the Ray ranks receives a SIGHUP (1, Hangup), right ? > > But this problem does not always happen because I have gotten some good > results from Ray when I ran it for other datasets. > I never got this SIGHUP with Ray. That's strange. Is it reproducible, meaning that if you run the same thing 10 times, do you get this SIGHUP 10 times too ? > Thanks > Lin > > > On Wed, Jun 12, 2013 at 8:00 AM, Sébastien Boisvert > <[email protected] <mailto:[email protected]>> > wrote: > > On 10/06/13 05:26 PM, Lin wrote: > > Hi, > > Thanks for your answers. > However, I got the error message from nohup.out. That is to say, I > have used nohup to run Ray. > > This is my command: > nohup mpiexec -n 16 Ray Col.conf & > > > Are all your MPI ranks running on the "oak" machine ? > > > And the Col.conf contains: > > -k 55 # this is a comment > -p /s/oak/a/nobackup/lin/Art/Col___illumina_art/Col_il1.fastq > /s/oak/a/nobackup/lin/Art/Col___illumina_art/Col_il2.fastq > > -o RayOutputOfCol > > > > > On Mon, Jun 10, 2013 at 2:02 PM, Sébastien Boisvert > <sebastien.boisvert.3@ulaval.__ca <mailto:[email protected]> > <mailto:sebastien.boisvert.3@__ulaval.ca > <mailto:[email protected]>>> wrote: > > On 09/06/13 11:35 AM, Lin wrote: > > Hi, Sébastien > > I changed the Max Kmer to 64. And set it as 55 in a run. > But it always end up with a problem like this. > "mpiexec noticed that process rank 11 with PID 25012 on node > oak exited on signal 1(Hangup)" > Could you help me figure it out? > > > The signal 1 is SIGHUP according to this list: > > $ kill -l > 1) SIGHUP 2) SIGINT 3) SIGQUIT 4) SIGILL > 5) SIGTRAP > 6) SIGABRT 7) SIGBUS 8) SIGFPE 9) SIGKILL > 10) SIGUSR1 > 11) SIGSEGV 12) SIGUSR2 13) SIGPIPE 14) SIGALRM > 15) SIGTERM > 16) SIGSTKFLT 17) SIGCHLD 18) SIGCONT 19) SIGSTOP > 20) SIGTSTP > 21) SIGTTIN 22) SIGTTOU 23) SIGURG 24) SIGXCPU > 25) SIGXFSZ > 26) SIGVTALRM 27) SIGPROF 28) SIGWINCH 29) SIGIO > 30) SIGPWR > 31) SIGSYS 34) SIGRTMIN 35) SIGRTMIN+1 36) SIGRTMIN+2 > 37) SIGRTMIN+3 > 38) SIGRTMIN+4 39) SIGRTMIN+5 40) SIGRTMIN+6 41) SIGRTMIN+7 > 42) SIGRTMIN+8 > 43) SIGRTMIN+9 44) SIGRTMIN+10 45) SIGRTMIN+11 46) SIGRTMIN+12 > 47) SIGRTMIN+13 > 48) SIGRTMIN+14 49) SIGRTMIN+15 50) SIGRTMAX-14 51) SIGRTMAX-13 > 52) SIGRTMAX-12 > 53) SIGRTMAX-11 54) SIGRTMAX-10 55) SIGRTMAX-9 56) SIGRTMAX-8 > 57) SIGRTMAX-7 > 58) SIGRTMAX-6 59) SIGRTMAX-5 60) SIGRTMAX-4 61) SIGRTMAX-3 > 62) SIGRTMAX-2 > 63) SIGRTMAX-1 64) SIGRTMAX > > > This signal is not related to the compilation option > MAXKMERLENGTH=64. > > You are gettig this signal because the parent process of your > mpiexec process dies > (probably because you are closing your terminal) and this causes > the SIGHUP that is being sent to your Ray processes. > > > There are several solutions to this issue (pick up on solution > in the list below): > > > 1. Use nohup^(i.e.: nohup mpiexec -n 999 Ray -p data1.fastq.gz > data2.fastq.gz > > 2. Launch your work inside a screen session (the screen command) > > 3. Launch your work inside a tmux session (the tmux command) > > 4. Use a job scheduler (like Moab, Grid Engine, or another). > > > --SÉB-- > > > > ------------------------------____----------------------------__--__------------------ > > How ServiceNow helps IT people transform IT departments: > 1. A cloud service to automate IT design, transition and > operations > 2. Dashboards that offer high-level views of enterprise > services > 3. A single system of record for all IT processes > http://p.sf.net/sfu/____servicenow-d2d-j > <http://p.sf.net/sfu/__servicenow-d2d-j> > <http://p.sf.net/sfu/__servicenow-d2d-j > <http://p.sf.net/sfu/servicenow-d2d-j>> > ___________________________________________________ > Denovoassembler-users mailing list > Denovoassembler-users@lists.____sourceforge.net > <http://sourceforge.net> > <mailto:Denovoassembler-users@__lists.sourceforge.net > <mailto:[email protected]>> > > https://lists.sourceforge.net/____lists/listinfo/____denovoassembler-users > <https://lists.sourceforge.net/__lists/listinfo/__denovoassembler-users> > <https://lists.sourceforge.__net/lists/listinfo/__denovoassembler-users > <https://lists.sourceforge.net/lists/listinfo/denovoassembler-users>> > > > > > ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev _______________________________________________ Denovoassembler-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/denovoassembler-users
