On 23/04/13 04:17 PM, Daniel Gruner wrote: > Hi Sebastien, > > Yes, it is on the GPC. > Here is the full script that was run: > > #!/bin/bash > #PBS -l nodes=128:ppn=8 > #PBS -l walltime=10:00:00 > #PBS -N moa_Ray_flash_assembly_31mer_128nodes_disable_recycling > cd $PBS_O_WORKDIR > module load gcc > module unload openmpi/1.4.4-intel-v12.1 > module load openmpi/1.4.4-gcc-v4.6.1 > mpiexec -n 1024 Ray -k 31 -o RayOutput_k31_128nodes_disable-recycling > -route-messages -connection-type debruijn -routing-graph-degree 32 > -disable-recycling -p > /scratch/a/abaker/acloutie/moa_ray_input/moa_all_R1.fastq > /scratch/a/abaker/acloutie/moa_ray_input/moa_all_R2.fastq -s > /scratch/a/abaker/acloutie/moa_ray_input/moa_all_single.fastq > -read-write-checkpoints checkpoints_128nodes_disable_recycling >
The polytope graph is much better than the de Bruijn graph for routing messages. But it is not the reason for the high number of I/O operations. A first possible reason for slow I/O is that the user has only 3 input files, yet it has 1024 MPI ranks. Sequencing data is definitely not produced as 3 files upstream. It's usually better to use more fastq input files. Another possible reason would be the checkpoints. Some of the checkpointing code group I/O operations. But some other parts do not. There is an opened issue to fix this: https://github.com/sebhtml/ray/issues/57 So a tail on the standard output would (probably) help to zero in on the real issue here. > Danny > > > On Tue, Apr 23, 2013 at 04:11:33PM -0400, Sébastien Boisvert wrote: >> Hello, >> >> I CC'ed the mail to the community mailing list. >> >> To subscribe: >> https://lists.sourceforge.net/lists/listinfo/denovoassembler-users >> >> On 23/04/13 04:01 PM, Daniel Gruner wrote: >>> Hi Alison, >>> >>> We've noticed you are running a fairly large Ray assembly job, like you >>> did last Saturday. It turns out that this job causes a high load on the >>> filesystem, and the effect is to slow down access to file for everybody >>> on SciNet. >>> >> >> On gpc, right ? >> >>> We are not sure why this is, and in fact I'd like to contact the developer >>> of >>> Ray, Sebastien Boisvert, about it. Perhaps you can tell me some details of >>> your current calculation, using 1024 cores. >>> >>> There is a new version of Ray out, and it is conceivable that some of the >>> problems >>> have been addressed. Would you be able to test it? Supposedly it fixes a >>> number >>> of issues. >> >> The following options can do a lot of input/output operations: >> >> -write-kmers >> >> -read-write-checkpoints CheckpointDirectory >> >>> >>> It may be useful to be able to tell the developer of Ray what exactly you >>> are doing >>> in your calculation, so that he may be able to determine if there is indeed >>> a >>> problem with his I/O strategy. >> >> Can you provide the complete command line ? >> >> >> The file format .fastq.gz has a readahead code implemented too. That may >> help. >> >>> >>> I am copying Sebastien in this email. >>> >>> Thanks and regards, >>> Danny >>> >> >> >> > ------------------------------------------------------------------------------ Try New Relic Now & We'll Send You this Cool Shirt New Relic is the only SaaS-based application performance monitoring service that delivers powerful full stack analytics. Optimize and monitor your browser, app, & servers with just a few lines of code. Try New Relic and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr _______________________________________________ Denovoassembler-users mailing list Denovoassembler-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/denovoassembler-users