Hi,

(Please use the mailing list.)

The "nodes=10:ppn=12" means that you want 10 machines with 12 processor cores 
each.

10 * 12 = 120. Typically, you would use 1 MPI rank per core. Thus
"mpiexec -n 48" should be "mpiexec -n 120".

Depending on the policy you are using on your supercomputer, using only 48 MPI 
ranks may only use the first
4 nodes (4 * 12 = 48). If that's the case, the 6 other nodes are not being used.


________________________________
From: Wilhelm, Roland [rwilh...@mail.ubc.ca]
Sent: Monday, September 15, 2014 5:58 PM
To: Boisvert, Sebastien
Subject: FW: [WG-accounts] Parallel account request rwilhelm

Good Afternoon Dr. Boisvert,

You may recall that I am trying to get Ray to perform an assembly on one of the 
Canada Compute clusters. You had instructed me to try running Ray with simply 
"-n 48" rather than using the -mini-ranks command. I tried this and it ran for 
a considerable while longer than with using mini-ranks  (~1hr vs 15min). 
However, I still received a similar error message. Before troubling you again, 
I decided to send an inquiry to one of the experts at Canada Compute. He wasn't 
able to help (see below).

The command I ran was:

mpiexec -n 48 /global/software/ray/ray230/Ray -o ~/ASSEMBLY/ -i 
amalgamated.diginorm.2.pe.fa -s amalgamated.diginorm.2.se.fa

The files are 25Gb combined.

Please let me know if there is anything you think I may be doing incorrectly!
Thanks in advance,

Roli


________________________________
From: Paul Wellings [welli...@ucalgary.ca]
Sent: Monday, September 15, 2014 1:46 PM
To: Wilhelm, Roland
Cc: accou...@westgrid.ca; Doug Phillips
Subject: Re: [WG-accounts] Parallel account request rwilhelm

From: <Wilhelm>, Roland <rwilh...@mail.ubc.ca<mailto:rwilh...@mail.ubc.ca>>
Date: Monday, September 15, 2014 at 10:25 AM
To: Paul Wellings <welli...@ucalgary.ca<mailto:welli...@ucalgary.ca>>, 
"accou...@westgrid.ca<mailto:accou...@westgrid.ca>" 
<accou...@westgrid.ca<mailto:accou...@westgrid.ca>>
Subject: RE: [WG-accounts] Parallel account request rwilhelm

Good Morning Paul,

Hi Roland,



I am not sure if this is the appropriate venue to be asking a specific question 
about an error message I'm receiving, but I have been in contact with the 
creator of the software and he has not been very helpful. The software is a "de 
novo" assembly tool called Ray and it runs on mpiexc. I've been able to get the 
program to run for just over an hour before getting an error message. I'm 
guessing it has to do with a memory over run. Below are some of the details of 
what I ran:

Submission parameters:
#!/bin/bash
#PBS -l nodes=10:ppn=12
#PBS -l walltime=72:00:00
#PBS -l mem=220g
#PBS -N 'SUPERASSEMBLY'
#PBS -m abe
#PBS -M roliwilh...@gmail.com<mailto:roliwilh...@gmail.com>

mpiexec -n 48 /global/software/ray/ray230/Ray -o ~/ASSEMBLY/ -i 
amalgamated.diginorm.2.pe.fa -s amalgamated.diginorm.2.se.fa

you are asking for 220GB (10 nodes worth) but only using 87GB of memory, so 
memory on a “job basis” is not being exhausted.  I note you are asking for ppn 
12 but are only using 48 processes, I presume that’s because you are expecting 
the processes to be distributed evenly across the 10 nodes and you need memory 
rather than number of processors.  Can’t determine from this whether there is a 
program problem or a problem with the analysis you are trying to do.  A 
different version of Ray (older) is available on
Nestor/Hermes so that might be worth a try.  My colleague Doug Phillips may be 
able to comment further.

Sorry I can’t be of more help,
p.



Error Message:
--------------------------------------------------------------------------
mpiexec noticed that process rank 31 with PID 27642 on node cn0766 exited on 
signal 11 (Segmentation fault).
--------------------------------------------------------------------------

Exit_status=139
resources_used.cput=53:37:56
resources_used.mem=86940732kb
resources_used.vmem=97546344kb
resources_used.walltime=01:18:04
Error_Path: parallel:/home/rwilhelm/'SUPERASSEMBLY'.e540652
Output_Path: parallel:/home/rwilhelm/'SUPERASSEMBLY'.o540652

Last chunk of Std-Out:
Rank 41 is counting k-mers in sequence reads [2000001/3703948]
Speed RAY_SLAVE_MODE_ADD_VERTICES 1219 units/second
Estimated remaining time for this step: 23 minutes, 17 seconds
Rank 12 has 45900000 vertices
Rank 12: assembler memory usage: 1837896 KiB
Rank 19 has 45900000 vertices
Rank 19: assembler memory usage: 1837948 KiB
Rank 18 has 45900000 vertices
Rank 18: assembler memory usage: 1837992 KiB
Rank 16 has 45900000 vertices
Rank 16: assembler memory usage: 1839088 KiB
Rank 10 has 45900000 vertices
Rank 10: assembler memory usage: 1841744 KiB
Rank 46 has 45900000 vertices
Rank 46: assembler memory usage: 1838032 KiB
Rank 27 has 45900000 vertices
Rank 27: assembler memory usage: 1837916 KiB
Rank 22 has 45900000 vertices
Rank 22: assembler memory usage: 1837932 KiB
Rank 42 has 45900000 vertices
Rank 42: assembler memory usage: 1821636 KiB


Thanks in advance,

Roland

________________________________________
From: Paul Wellings [welli...@ucalgary.ca<mailto:welli...@ucalgary.ca>]
Sent: Tuesday, August 05, 2014 7:09 AM
To: Wilhelm, Roland; accou...@westgrid.ca<mailto:accou...@westgrid.ca>
Subject: Re: [WG-accounts] Parallel account request rwilhelm

-----Original Message-----
From: <Wilhelm>, Roland <rwilh...@mail.ubc.ca<mailto:rwilh...@mail.ubc.ca>>
Date: Monday, August 4, 2014 at 11:43 AM
To: "accou...@westgrid.ca<mailto:accou...@westgrid.ca>" 
<accou...@westgrid.ca<mailto:accou...@westgrid.ca>>
Subject: [WG-accounts] Parallel account request rwilhelm

>Dear WestGrid Rep,
>
>May I please make use of the Parallel system. I will be performing
>metagenomic assemblies using "Ray 2.3.0"
>
>Thank you in advance,
>
>Roland Wilhelm
>_______________________________________________
>accounts-l mailing list
>account...@lists.westgrid.ca<mailto:account...@lists.westgrid.ca>
>

Hi Roland,

your parallel account has been created, it should be available for use
sometime this afternoon. Please see our Quickstart guide for information
(http://www.westgrid.ca/support/quickstart/parallel).

Best wishes,
p.


--
Paul Wellings | Analyst | Research Computing Services | University of
Calgary |
Math Sciences | 2500 University Drive NW, Calgary, AB, T2N 1N4 |
Phone: (403) 220-6970 | E-mail: 
welli...@ucalgary.ca<mailto:welli...@ucalgary.ca> |





------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce.
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
Denovoassembler-users mailing list
Denovoassembler-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/denovoassembler-users

Reply via email to