Therefore the user should split in more than 3 input file when running on 1024 MPI ranks.
On 23/04/13 04:37 PM, Daniel Gruner wrote: > Here is the tail of the stdout: > > Rank 358 has 390000 sequence reads > Rank 358: assembler memory usage: 273852 KiB > Rank 920 has 330000 sequence reads > Rank 920: assembler memory usage: 264648 KiB > Rank 920 has 340000 sequence reads > Rank 920: assembler memory usage: 264648 KiB > Rank 358 has 400000 sequence reads > Rank 358: assembler memory usage: 273852 KiB > Rank 358 has 400690 sequence reads (completed) > Rank 358 is writing checkpoint Sequences > Rank 920 has 350000 sequence reads > Rank 920: assembler memory usage: 281036 KiB > Rank 920 has 360000 sequence reads > Rank 920: assembler memory usage: 281036 KiB > Rank 920 has 370000 sequence reads > Rank 920: assembler memory usage: 281036 KiB > Rank 920 has 380000 sequence reads > Rank 920: assembler memory usage: 281036 KiB > Rank 920 has 390000 sequence reads > Rank 920: assembler memory usage: 281036 KiB > Rank 920 has 400000 sequence reads > Rank 920: assembler memory usage: 281036 KiB > Rank 920 has 400690 sequence reads (completed) > Rank 920 is writing checkpoint Sequences > Rank 961 has 0 sequence reads > Rank 961: assembler memory usage: 265612 KiB > > The head node shows about 100 minutes of cpu usage on each rank. > > Danny > > On Tue, Apr 23, 2013 at 04:31:14PM -0400, Sébastien Boisvert wrote: >> On 23/04/13 04:17 PM, Daniel Gruner wrote: >>> Hi Sebastien, >>> >>> Yes, it is on the GPC. >>> Here is the full script that was run: >>> >>> #!/bin/bash >>> #PBS -l nodes=128:ppn=8 >>> #PBS -l walltime=10:00:00 >>> #PBS -N moa_Ray_flash_assembly_31mer_128nodes_disable_recycling >>> cd $PBS_O_WORKDIR >>> module load gcc >>> module unload openmpi/1.4.4-intel-v12.1 >>> module load openmpi/1.4.4-gcc-v4.6.1 >>> mpiexec -n 1024 Ray -k 31 -o RayOutput_k31_128nodes_disable-recycling >>> -route-messages -connection-type debruijn -routing-graph-degree 32 >>> -disable-recycling -p >>> /scratch/a/abaker/acloutie/moa_ray_input/moa_all_R1.fastq >>> /scratch/a/abaker/acloutie/moa_ray_input/moa_all_R2.fastq -s >>> /scratch/a/abaker/acloutie/moa_ray_input/moa_all_single.fastq >>> -read-write-checkpoints checkpoints_128nodes_disable_recycling >>> >> >> The polytope graph is much better than the de Bruijn graph for routing >> messages. But it is not the reason for the high number of >> I/O operations. >> >> >> >> A first possible reason for slow I/O is that the user has only 3 input >> files, yet it has 1024 MPI ranks. >> >> Sequencing data is definitely not produced as 3 files upstream. >> >> It's usually better to use more fastq input files. >> >> >> >> Another possible reason would be the checkpoints. Some of the checkpointing >> code group I/O operations. But some other >> parts do not. >> >> There is an opened issue to fix this: >> https://github.com/sebhtml/ray/issues/57 >> >> >> So a tail on the standard output would (probably) help to zero in on the >> real issue here. >> >> >> >>> Danny >>> >>> >>> On Tue, Apr 23, 2013 at 04:11:33PM -0400, Sébastien Boisvert wrote: >>>> Hello, >>>> >>>> I CC'ed the mail to the community mailing list. >>>> >>>> To subscribe: >>>> https://lists.sourceforge.net/lists/listinfo/denovoassembler-users >>>> >>>> On 23/04/13 04:01 PM, Daniel Gruner wrote: >>>>> Hi Alison, >>>>> >>>>> We've noticed you are running a fairly large Ray assembly job, like you >>>>> did last Saturday. It turns out that this job causes a high load on the >>>>> filesystem, and the effect is to slow down access to file for everybody >>>>> on SciNet. >>>>> >>>> >>>> On gpc, right ? >>>> >>>>> We are not sure why this is, and in fact I'd like to contact the >>>>> developer of >>>>> Ray, Sebastien Boisvert, about it. Perhaps you can tell me some details >>>>> of >>>>> your current calculation, using 1024 cores. >>>>> >>>>> There is a new version of Ray out, and it is conceivable that some of the >>>>> problems >>>>> have been addressed. Would you be able to test it? Supposedly it fixes a >>>>> number >>>>> of issues. >>>> >>>> The following options can do a lot of input/output operations: >>>> >>>> -write-kmers >>>> >>>> -read-write-checkpoints CheckpointDirectory >>>> >>>>> >>>>> It may be useful to be able to tell the developer of Ray what exactly you >>>>> are doing >>>>> in your calculation, so that he may be able to determine if there is >>>>> indeed a >>>>> problem with his I/O strategy. >>>> >>>> Can you provide the complete command line ? >>>> >>>> >>>> The file format .fastq.gz has a readahead code implemented too. That may >>>> help. >>>> >>>>> >>>>> I am copying Sebastien in this email. >>>>> >>>>> Thanks and regards, >>>>> Danny >>>>> >>>> >>>> >>>> >>> >> > ------------------------------------------------------------------------------ Try New Relic Now & We'll Send You this Cool Shirt New Relic is the only SaaS-based application performance monitoring service that delivers powerful full stack analytics. Optimize and monitor your browser, app, & servers with just a few lines of code. Try New Relic and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr _______________________________________________ Denovoassembler-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/denovoassembler-users
