On 19/06/13 05:00 PM, Lin wrote:
> Hi, Sébastien,
>
> I tried your suggestion and ran my job inside screen without nohup.
> It did not stop with signal SIGHUP.
> However, the Ray almost ran out of memory. And it has been running over three 
> days from I ran the job.
> It always repeat some information like this

You can reduce the number of reads or increase the number of machines on which 
you run Ray.



> "
> Rank 1 is counting k-mers in sequence reads [11200001/22166944]
> Speed RAY_SLAVE_MODE_ADD_VERTICES 0 units/second
> Estimated remaining time for this step: -8 seconds
> Rank 10 has 621700000 vertices
> Rank 10: assembler memory usage: 36323284 KiB
> Rank 13 has 621600000 vertices
> Rank 13: assembler memory usage: 36323288 KiB
> Rank 8 has 621700000 vertices
> Rank 8: assembler memory usage: 36323280 KiB
> Rank 7 has 621700000 vertices
> Rank 7: assembler memory usage: 36323280 KiB
> Rank 3 has 621800000 vertices
> Rank 3: assembler memory usage: 36323284 KiB
> Rank 2 has 621700000 vertices
> Rank 2: assembler memory usage: 36323284 KiB
> Rank 1 has 621700000 vertices
> Rank 1: assembler memory usage: 36319196 KiB
> Rank 12 has 621700000 vertices
> Rank 12: assembler memory usage: 36323280 KiB
> Rank 6 has 621700000 vertices
> .....
> .....
> Rank 5 is counting k-mers in sequence reads [11000001/22166944]
> Speed RAY_SLAVE_MODE_ADD_VERTICES 0 units/second
> Estimated remaining time for this step: -8 seconds
> "
>
>
>
>
>
> On Fri, Jun 14, 2013 at 3:24 PM, Sébastien Boisvert 
> <[email protected] <mailto:[email protected]>> 
> wrote:
>
>     Hello,
>
>     I don't really understand what is the problem.
>
>     You said that one of your Ray processes is receiving a SIGHUP signal, 
> even if you are running
>     the whole thing with nohup, right ?
>
>
>     One explanation could be that another user is sending SIGHUP with the 
> kill program to your Ray processes.
>
>
>     Can you try running your job inside screen or tmux ?
>
>
>     On 13/06/13 01:27 PM, Lin wrote:
>
>         On 12/06/13 05:10 PM, Lin wrote:
>
>              Hi,
>
>              Yes, they are.  When I run "top" or "ps". There are exactly 16 
> Ray ranks and one mpiexec process in the oak machine.
>
>
>         But this is before one of the Ray ranks receives a SIGHUP (1, 
> Hangup), right ?
>
>         Yes. Sometime the SIGHUP will lead to not only one process killed.
>
>         This is the newest error message I got.
>         """
>         mpiexec noticed that process rank 8 with PID 18757 on node oak exited 
> on signal 1 (Hangup).
>         
> ------------------------------__------------------------------__--------------
>         3 total processes killed (some possibly by mpiexec during cleanup)
>         """
>
>
>              But this problem does not always happen because I have gotten 
> some good results from Ray when I ran it for other datasets.
>
>
>         I never got this SIGHUP with Ray. That's strange.
>
>         Is it reproducible, meaning that if you run the same thing 10 times, 
> do you get this SIGHUP 10 times too ?
>
>         I can not say it is totally reproducible. But if I run the same thing 
> 10 times. I guess 9 of them will be failed.
>
>         Yes, it is really strange because I did not get any error when I ran 
> it the first several times.
>
>
>
>
>         On Thu, Jun 13, 2013 at 7:59 AM, Sébastien Boisvert 
> <sebastien.boisvert.3@ulaval.__ca <mailto:[email protected]> 
> <mailto:sebastien.boisvert.3@__ulaval.ca 
> <mailto:[email protected]>>> wrote:
>
>              On 12/06/13 05:10 PM, Lin wrote:
>
>                  Hi,
>
>                  Yes, they are.  When I run "top" or "ps". There are exactly 
> 16 Ray ranks and one mpiexec process in the oak machine.
>
>
>              But this is before one of the Ray ranks receives a SIGHUP (1, 
> Hangup), right ?
>
>
>
>                  But this problem does not always happen because I have 
> gotten some good results from Ray when I ran it for other datasets.
>
>
>              I never got this SIGHUP with Ray. That's strange.
>
>              Is it reproducible, meaning that if you run the same thing 10 
> times, do you get this SIGHUP 10 times too ?
>
>
>                  Thanks
>                  Lin
>
>
>
>                  On Wed, Jun 12, 2013 at 8:00 AM, Sébastien Boisvert 
> <sebastien.boisvert.3@ulaval.____ca <mailto:sebastien.boisvert.3@__ulaval.ca 
> <mailto:[email protected]>> <mailto:sebastien.boisvert.3@ 
> <mailto:sebastien.boisvert.3@>____ulaval.ca <http://ulaval.ca> 
> <mailto:sebastien.boisvert.3@__ulaval.ca 
> <mailto:[email protected]>>>> wrote:
>
>                       On 10/06/13 05:26 PM, Lin wrote:
>
>                           Hi,
>
>                           Thanks for your answers.
>                           However, I got the error message from nohup.out. 
> That is to say, I have used nohup to run Ray.
>
>                           This is my command:
>                           nohup mpiexec -n 16 Ray Col.conf &
>
>
>                       Are all your MPI ranks running on the "oak" machine ?
>
>
>                           And the Col.conf contains:
>
>                           -k 55  # this is a comment
>                           -p 
> /s/oak/a/nobackup/lin/Art/Col_______illumina_art/Col_il1.fastq
>                                
> /s/oak/a/nobackup/lin/Art/Col_______illumina_art/Col_il2.fastq
>
>                           -o RayOutputOfCol
>
>
>
>
>
>
>                           On Mon, Jun 10, 2013 at 2:02 PM, Sébastien Boisvert 
> <sebastien.boisvert.3@ulaval.______ca <mailto:sebastien.boisvert.3@ 
> <mailto:sebastien.boisvert.3@>____ulaval.ca <http://ulaval.ca> 
> <mailto:sebastien.boisvert.3@__ulaval.ca 
> <mailto:[email protected]>>> <mailto:sebastien.boisvert.3@ 
> <mailto:sebastien.boisvert.3@> <mailto:sebastien.boisvert.3@ 
> <mailto:sebastien.boisvert.3@>>______ulaval.ca <http://ulaval.ca> 
> <http://ulaval.ca> <mailto:sebastien.boisvert.3@ 
> <mailto:sebastien.boisvert.3@>____ulaval.ca <http://ulaval.ca> 
> <mailto:sebastien.boisvert.3@__ulaval.ca 
> <mailto:[email protected]>>>>> wrote:
>
>                                On 09/06/13 11:35 AM, Lin wrote:
>
>                                    Hi, Sébastien
>
>                                    I changed the Max Kmer to 64. And set it 
> as 55 in a run.
>                                    But it always end up with a problem like 
> this.
>                                    "mpiexec noticed that process rank 11 with 
> PID 25012 on node oak exited on signal 1(Hangup)"
>                                    Could you help me figure it out?
>
>
>                                The signal 1 is SIGHUP according to this list:
>
>                                $ kill -l
>                                  1) SIGHUP       2) SIGINT       3) SIGQUIT   
>    4) SIGILL       5) SIGTRAP
>                                  6) SIGABRT      7) SIGBUS       8) SIGFPE    
>    9) SIGKILL     10) SIGUSR1
>                                11) SIGSEGV     12) SIGUSR2     13) SIGPIPE    
>  14) SIGALRM     15) SIGTERM
>                                16) SIGSTKFLT   17) SIGCHLD     18) SIGCONT    
>  19) SIGSTOP     20) SIGTSTP
>                                21) SIGTTIN     22) SIGTTOU     23) SIGURG     
>  24) SIGXCPU     25) SIGXFSZ
>                                26) SIGVTALRM   27) SIGPROF     28) SIGWINCH   
>  29) SIGIO       30) SIGPWR
>                                31) SIGSYS      34) SIGRTMIN    35) SIGRTMIN+1 
>  36) SIGRTMIN+2  37) SIGRTMIN+3
>                                38) SIGRTMIN+4  39) SIGRTMIN+5  40) SIGRTMIN+6 
>  41) SIGRTMIN+7  42) SIGRTMIN+8
>                                43) SIGRTMIN+9  44) SIGRTMIN+10 45) 
> SIGRTMIN+11 46) SIGRTMIN+12 47) SIGRTMIN+13
>                                48) SIGRTMIN+14 49) SIGRTMIN+15 50) 
> SIGRTMAX-14 51) SIGRTMAX-13 52) SIGRTMAX-12
>                                53) SIGRTMAX-11 54) SIGRTMAX-10 55) SIGRTMAX-9 
>  56) SIGRTMAX-8  57) SIGRTMAX-7
>                                58) SIGRTMAX-6  59) SIGRTMAX-5  60) SIGRTMAX-4 
>  61) SIGRTMAX-3  62) SIGRTMAX-2
>                                63) SIGRTMAX-1  64) SIGRTMAX
>
>
>                                This signal is not related to the compilation 
> option MAXKMERLENGTH=64.
>
>                                You are gettig this signal because the parent 
> process of your mpiexec process dies
>                                (probably because you are closing your 
> terminal) and this causes the SIGHUP that is being sent to your Ray processes.
>
>
>                                There are several solutions to this issue 
> (pick up on solution in the list below):
>
>
>                                1. Use nohup^(i.e.: nohup mpiexec -n 999 Ray 
> -p data1.fastq.gz data2.fastq.gz
>
>                                2. Launch your work inside a screen session 
> (the screen command)
>
>                                3. Launch your work inside a tmux session (the 
> tmux command)
>
>                                4. Use a job scheduler (like Moab, Grid 
> Engine, or another).
>
>
>                                --SÉB--
>
>
>                                    
> ------------------------------________------------------------__--__--__--__------------------
>
>
>
>                                    How ServiceNow helps IT people transform 
> IT departments:
>                                    1. A cloud service to automate IT design, 
> transition and operations
>                                    2. Dashboards that offer high-level views 
> of enterprise services
>                                    3. A single system of record for all IT 
> processes
>         http://p.sf.net/sfu/________servicenow-d2d-j 
> <http://p.sf.net/sfu/______servicenow-d2d-j> 
> <http://p.sf.net/sfu/______servicenow-d2d-j 
> <http://p.sf.net/sfu/____servicenow-d2d-j>> 
> <http://p.sf.net/sfu/______servicenow-d2d-j 
> <http://p.sf.net/sfu/____servicenow-d2d-j> 
> <http://p.sf.net/sfu/____servicenow-d2d-j 
> <http://p.sf.net/sfu/__servicenow-d2d-j>>> 
> <http://p.sf.net/sfu/______servicenow-d2d-j 
> <http://p.sf.net/sfu/____servicenow-d2d-j> 
> <http://p.sf.net/sfu/____servicenow-d2d-j 
> <http://p.sf.net/sfu/__servicenow-d2d-j>> 
> <http://p.sf.net/sfu/____servicenow-d2d-j 
> <http://p.sf.net/sfu/__servicenow-d2d-j> 
> <http://p.sf.net/sfu/__servicenow-d2d-j 
> <http://p.sf.net/sfu/servicenow-d2d-j>>>>
>                                    
> _______________________________________________________
>                                    Denovoassembler-users mailing list
>                                    
> Denovoassembler-users@lists.________sourceforge.net <http://sourceforge.net> 
> <http://sourceforge.net> <http://sourceforge.net> 
> <mailto:Denovoassembler-users@ <mailto:Denovoassembler-users@> 
> <mailto:Denovoassembler-users@ 
> <mailto:Denovoassembler-users@>__>____lists.sourceforge.net 
> <http://lists.sourceforge.net> <http://lists.sourceforge.net> 
> <mailto:Denovoassembler-users@ 
> <mailto:Denovoassembler-users@>____lists.sourceforge.net 
> <http://lists.sourceforge.net> 
> <mailto:Denovoassembler-users@__lists.sourceforge.net 
> <mailto:[email protected]>>>>
>         
> https://lists.sourceforge.net/________lists/listinfo/________denovoassembler-users
>  
> <https://lists.sourceforge.net/______lists/listinfo/______denovoassembler-users>
>  
> <https://lists.sourceforge.__net/____lists/listinfo/______denovoassembler-users
>  
> <https://lists.sourceforge.net/____lists/listinfo/____denovoassembler-users>> 
> <https://lists.sourceforge.____net/__lists/listinfo/______denovoassembler-users
>  <https://lists.sourceforge.__net/__lists/listinfo/____denovoassembler-users 
> <https://lists.sourceforge.net/__lists/listinfo/__denovoassembler-users>>> 
> <https://lists.sourceforge.______net/lists/listinfo/______denovoassembler-users
>  <https://lists.sourceforge.____net/lists/listinfo/____denovoassembler-users 
> <https://lists.sourceforge.__net/lists/listinfo/__denovoassembler-users 
> <https://lists.sourceforge.net/lists/listinfo/denovoassembler-users>>>>
>
>
>
>
>
>
>
>
>


------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
_______________________________________________
Denovoassembler-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/denovoassembler-users

Reply via email to