Sorry for the cross-posting, my earliar email to mpiblast mailing list got returned perhaps due to attachments.
Hi Aaron, Hi All, I have been struggling to run mpiblast for the dataset of proteins I have from 36 fungal genomes. I wanted to compare all proteins to each other, this is usually called 'all-against-all blast'. Simple blastp on this dataset, chopped into 36 fragments, takes around 24 hours on our cluster which is running Score (version 5.8.4.r3) as an mpi environment and have 25 nodes, each with 4 cores and 8 GB of RAM. On the same cluster when I run mpiblast on 38 fragments using 40 nodes, even after 12 days I only get 22% of the total estimated output (I know the complete output from earlier standard blast run). Aaron suggested to test the following and seems It still did not solve the problem. I tested the following suggestions from Aaron: 1) --db-replicate-count=5 option so that mpiblast will distribute a five copies of the database across worker nodes and this may enhance the performance. With this option, I checked the results after 4 days and I merely get 6% of the output. 2) mpiblast is optimized for mpich or lam-mpi so it may be useful to test mpiblast with these mpi implementations we have got gnu mpich installed and run mpiblast for our dataset with --debug as below: >mpiformatdb -i 36FungalJGIanigNbcin_M40 --nfrags=38 -p T --skip-reorder >/usr/local/sge6.0/bin/lx24-amd64/qsub -pe mpich 10 /usr/local/mpich-GNU64/bin/mpirun -arch LINUX -machinefile /usr/local/mpich-GNU64/share/machines.LINUX -np 40 -nodes 10 -nolocal -allcpus -v /software/man2/manchester/mpiblast/tool/bin/mpiblast --debug -p blastp -d 36FungalJGIanigNbcin_M40 -i /scratch/man2/nwmcixa/mpiblast/work/36FungalJGIanigNbcin_M40 -m 8 -e 1e-5 -o /scratch/man2/nwmcixa/mpiblast/work/36FungalJGIanigNbcin_M40_mpi.out It did not produce any output, STDOUT, as --debug options was used, is shown at: http://www.bioinf.manchester.ac.uk/~alam/xxxxxx/mpiblast/mpirun_36genomes.e29468 3) 2) benchmark dataset (300K erwinia query set against the first 14GB of nt using blastn) mpiblast was run on the benchmark dataset and It produced 1.07Mb of output (shown at http://www.bioinf.manchester.ac.uk/~alam/xxxxxx/mpiblast/echry_In_nt14G_m8.out ) within 14 minutes, the STDOUT is shown at: http://www.bioinf.manchester.ac.uk/~alam/xxxxxx/mpiblast/mpirun_benchmark.e29467 I am not sure whether this run produced the complete output as compared to the standard mpiblast test on the benchmark dataset by Aaron Darling. I am not sure what is the cause of slow mpiblast performance on my dataset. Is it possible to produce a version of mpiblast that splits the query dataset instead of the search|target dataset when the sizes of the query and target dataset are the same? This way there will be no need to correct the e-value calculation and the output of each fragmented query against the complete dataset can be concatenated to get the complete results. Running standard blast in this way produces the output in around 24 hours. I hope to get some help to resolve mpiblast on the dataset I have. Many Thanks, Intikhab -- Dr. Intikhab Alam Research Associate School of Computer Science University of Manchester LF1, Kilburn Building, Oxford Road Manchester, M13 9PL United Kingdom http://www.cs.man.ac.uk/~ialam ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Mpiblast-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/mpiblast-users
