Fellow mpiBLAST users, I have found the performance of mpiBLAST v. 1.4.0 to be a bit disappointing compared to previous versions. It is entirely possible that I am doing something wrong so I want to elicit any suggestions you might have.
Background Cluster: 40 cpus, 96 GB ram aggregate. Master = 4 x dual core opteron, 32 GB ram, Fedora Core 5 Nodes(8) = 2 x dual core opteron, 8 GB ram, Fedora Core 5 Gigabit Enet interconnect. mpiBLAST: I have tried two variations of mpiBLAST 1.4.0: 1. Compiled myself from source using the May 2005 (2.2.11) ncbi toolkit and MPICH2. 2. RPMs from Joe Landman at Scalable Informatics (Oct 2004 ncbi, LAM-MPI 7.1.1). The problem occurs with either of these installations so it doesn't appear to be something peculiar about a particular build. The Data: A set of ESTs and consensus sequences built from contiged ESTs. 28,485 sequences, total query length ~26 million bases. Databases: Either NCBI nr (or RefSeq Protein complete) divided into 38 segments by mpiformatdb. Command line: mpirun -np 40 mpiblast -p blastx -d nr -i my.data.fasta -U -f 14 -I T -e 1 -v 25 -b 25 -o my.blast.output & The Problem: Everything starts out fine, the cluster is humming along with all cpus nearly pegged, then, about a half hour into the run the cpu utilization drops through the floor. The job(s) continue to run but they spend more time idling than actually searching. Running top on the head node I find on mpiBLAST job (presumably the writer process since it has the lowest pid) stays at ~100% cpu all the time but the remaining processes idle most of the time and then show brief bursts of activity. The same behavior is seen on the worker nodes (of course there is no continuously active writer process on the nodes.) While initially the Average Load for the entire cluster is >95%, it then drops to < 20% for the remainder of the run. At some points the load is only 1-2%. It is really quite depressing looking at my Ganglia page and seeing all of those wasted cycles. Problem is most pronounced with large datasets. When the dataset is <= 3000 sequences I don't see it. Of course the job is done in < 30 minutes which is about the time I start to see the drop in cpu utilization on the larger jobs. I have search the mailing list archives and have not found anything which resembles this problem. Have other people seen this behavior with version 1.4.0? Any insights from the developers? Thanks in advance for any and all input. Kevin M. Carr ************************** Bioinformatics Specialist Research Technology Support Facility 202-D Biochemistry Bldg. Michigan State University East Lansing, MI 48824 Ph: (517) 353-6794 Fax:(517) 353-8638 ************************** ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Mpiblast-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/mpiblast-users
