Hi Aron, I wanted to have the benchmark dataset so that I could test mpiblast performance. Could you please point me to the dataset. In the meantime I am trying to get mpich running on the cluster.
Many Thanks, Intikhab ----- Original Message ----- From: "intikhab alam" <[EMAIL PROTECTED]> To: "Aaron Darling" <[EMAIL PROTECTED]> Sent: Friday, March 02, 2007 1:20 PM Subject: Re: [Mpiblast-users] blast in 1 day but could not get mpiblast done even in 10 days for the same dataset > Hi Aron, > > I would like to try out the benchmark dataset, could you point me from > where I could download this? > > > Intikhab > ----- Original Message ----- > From: "Aaron Darling" <[EMAIL PROTECTED]> > To: "intikhab alam" <[EMAIL PROTECTED]> > Sent: Friday, March 02, 2007 6:21 AM > Subject: Re: [Mpiblast-users] blast in 1 day but could not get > mpiblast done even in 10 days for the same dataset > > > : It sounds like there must be something causing an mpiblast-specific > : communications bottleneck in your system. Anybody else have ideas > : here? If you're keen to verify that, you could run mpiblast on the > : benchmark dataset we were using on Green Destiny and compare > runtimes. > : My latest benchmark data set (dated June 2005) has a runtime of > about 16 > : minutes for 64 nodes to search the 300K erwinia query set against > the > : first 14GB of nt using blastn. Each compute node in that machine > was a > : 667Mhz transmeta chip, 640MB ram, connected via 100Mbit ethernet. I > was > : using mpich2-1.0.1, no SCore. Based on paper specs, your cluster > should > : be quicker than that. > : > : On the other hand, if you've got wild amounts of load imbalance, > : --db-replicate-count=5 may not be enough, and 41 may prove ideal > (where > : 41 = the number of nodes in your cluster). In that case, mpiblast > will > : have effectively copied the entire database to each node, totally > : factoring out load imbalance from the compute time equation. Your > : database is much smaller than each node's core memory, and a single > : fragment is probably much larger than each node's CPU cache, so I > can't > : think of a good reason not to fully distribute the database, apart > from > : the time it takes to copy DB fragments around. > : > : In any case, keep me posted if you discover anything. > : > : -Aaron > : > : > : intikhab alam wrote: > : > Hi Aaron, > : > > : > As per your suggestion, I used the following option: > : > > : > --db-replicate-count=5 > : > > : > assuming it may help reach the 24hrs mark to complete the job. > : > However, I see that only 6% of the (total estimated) output has > been > : > generated until now(i.e after 4 days (4*24 hrs). If I continue > this > : > way, my mpiblast would finish in 64 days. Any other suggestion to > : > improve the running time? > : > > : > Intikhab > : > ----- Original Message ----- > : > From: "Aaron Darling" <[EMAIL PROTECTED]> > : > To: "intikhab alam" <[EMAIL PROTECTED]>; > : > <[email protected]> > : > Sent: Wednesday, February 21, 2007 1:33 AM > : > Subject: Re: [Mpiblast-users] blast in 1 day but could not get > : > mpiblast done even in 10 days for the same dataset > : > > : > > : > : Hi Intikhab... > : > : > : > : intikhab alam wrote: > : > : > : can take a long time to compute the effective search space > : > required > : > : > for > : > : > : exact e-value calculation. If that's the problem, then you > : > would > : > : > find > : > : > : just one mpiblast process consuming 100% cpu on the rank 0 > node > : > for > : > : > : hours or days, without any output. > : > : > > : > : > Is the effective search space calculation done on the master > node? > : > If > : > : > yes, this mpiblast job stayed at the master node for some > hours > : > and > : > : > then all the compute nodes got busy with >90% usage all the > time > : > with > : > : > continued output being generated until the 12th day when I > killed > : > the > : > : > job. > : > : > > : > : > : > : yes, the search space calculation is done on the master node and > it > : > : sounds like using the --fast-evalue-approximation command-line > : > switch > : > : would save you a few hours, which is pretty small compared to > the > : > weeks > : > : or months that the rest of the search is taking. > : > : > : > : > : > : > : > : The more likely limiting factor is load imbalance on the > : > cluster. > : > : > > : > : > > : > : > In this case, do you think the job should finish on some nodes > : > earliar > : > : > than others? In my case job was running on all the nodes with > >90% > : > : > usage and the last output I got was on the last day when I > killed > : > the > : > : > job. > : > : > > : > : It's possible the other nodes may continue running mpiblast > workers > : > : which are waiting to send results back to the mpiblast writer > : > process. > : > : > : > : > : If some database fragments happen to have a large number of > hits > : > and > : > : > : others have few, and the database is distributed as one > fragment > : > per > : > : > : node, then the computation may be heavily imbalanced and may > run > : > : > quite > : > : > : slowly. CPU consumption as given by a CPU monitoring tool > may > : > not > : > : > be > : > : > : indicative of useful work being done on the nodes since > workers > : > can > : > : > do a > : > : > : timed spin-wait for new work. > : > : > : I can suggest two avenues to achieve better load balance > with > : > : > mpiblast > : > : > : 1.4.0. First, partition the database into more fragments, > : > possibly > : > : > two > : > : > : or three times as many as you currently have. Second, use > the > : > : > > : > : > You mean more fragments that inturn means to use more nodes? > : > Actually > : > : > at our cluster not more than 44 nodes are allowed for the > parallel > : > : > jobs. > : > : > > : > : no, it's not necessary to run on more nodes when creating more > : > : fragments. mpiblast 1.4.0 needs at least as many fragments as > nodes > : > : when --db-replicate-count=1 (the default value). > : > : when there are more fragments than nodes, mpiblast will happily > : > : distribute the extra fragments among the nodes. > : > : > : > : > : --db-replicate-count option to mpiblast. The default value > for > : > the > : > : > : db-replicate-count is 1, which indicates that mpiblast will > : > : > distribute a > : > : > : single copy of your database across worker nodes. For your > : > setup, > : > : > each > : > : > : node was probably getting a single fragment. By setting > : > : > > : > : > > : > : > Is it not right if each single node gets a single fragment of > the > : > : > target database (the number of nodes assigned for mpiblast = > : > number of > : > : > fragments+2) so that the whole query dataset could be searched > : > against > : > : > the fragment (effective search space calculation being done > before > : > : > starting the search for blast comparable evalues) on each > single > : > node? > : > : > > : > : the search space calculation happens on the rank 0 process and > : > totally > : > : unrelated to the number of nodes and number of DB fragments. > The > : > most > : > : basic mpiblast setup has one fragment per node, but when > : > load-balancing > : > : is desirable, as in your case, mpiblast can be configured to use > : > : multiple fragments per node. This will not affect the e-value > : > calculation. > : > : > : > : > > : > : > : --db-replicate-count to something like 5, each fragment > would be > : > : > copied > : > : > : to five different compute nodes, and thus five nodes would > be > : > : > available > : > : > : to search fragments that happen to have lots of hits. In > the > : > : > extreme > : > : > > : > : > You mean this way nodes would be busy searching the query > dataset > : > : > against the same fragment on 5 compute nodes? Is this just a > way > : > to > : > : > keep the nodes busy until all the nodes complete the searches? > : > : > > : > : Yes, this will balance the load and will probably speed up your > : > search. > : > : > : > : > : case you could set --db-replicate-count equal to the number > of > : > : > : fragments, which would be fine if per-node memory and disk > space > : > is > : > : > : substantially larger than the total size of the formatted > : > database. > : > : > : > : > : > > : > : > Is it possible in mpiblast that for cases where the size of > the > : > query > : > : > dataset is equal to the size of target dataset, the query > dataset > : > : > should be fragmented, the target dataset should be kept in the > : > : > global/shared area and searches are done on single nodes (the > : > number > : > : > of nodes equal to the number of query dataset fragments) and > this > : > way > : > : > there would be no need to calculate the effective search space > as > : > all > : > : > the search jobs get the same size of the target dataset? by > : > following > : > : > this way I managed to complete this job using standard blast > in < > : > : > 24hrs. > : > : > > : > : The parallelization approach you describe is perfectly > reasonable > : > when > : > : the total database size is less than core memory size on each > node. > : > : With a properly configured --db-replicate-count, I would guess > that > : > : mpiblast could approach the 24 hour mark, although may take > slightly > : > : longer since there are various overheads involved with copying > of > : > : fragments and serial computation of the effective search space. > : > : > : > : > : > : > : > : > : > : In your particular situation, it may also help to randomize > the > : > : > order of > : > : > : sequences in the database to minimize "fragment hotspots" > which > : > : > could > : > : > : result from a database self-search. > : > : > > : > : > I did not get the "fragment hotspots" bit here. By randomizing > the > : > : > order of sequence you mean each node would possibly take > similar > : > time > : > : > to finish the searches? Otherwise it could be possible that > the > : > number > : > : > of hits could be lower for some fragments than others and this > : > ends up > : > : > in different times for the job completion on different nodes? > : > : > > : > : Right, the goal is to get the per-fragment search time more > balanced > : > : through randomization. But after thinking about it a bit more, > i'm > : > not > : > : sure just how much this would save.... > : > : > : > : > > : > : > : At the moment mpiblast doesn't have > : > : > : code to accomplish such a feat, but I think others (Jason > Gans?) > : > : > have > : > : > : written code for this in the past. > : > : > > : > : > Aaron, do you think Score based mpi communication may be > delaying > : > the > : > : > overall time in running mpiblast searches? > : > : > > : > : It's possible. > : > : The interprocess communication in 1.4.0 was fine-tuned for > default > : > : mpich2 1.0.2 and lam/mpi implementations. We use various > : > combinations > : > : of the non-blocking MPI_Issend(), MPI_Irecv(), and the blocking > : > : send/recv api in mpiblast 1.4.0. I have no idea how it would > : > interact > : > : with SCore. > : > : > : > : -Aaron > : > : > : > : > : > > : > : > ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Mpiblast-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/mpiblast-users
