Re: [Mpiblast-users] blast in 1 day but could not get mpiblast done even in 10 days for the same dataset

Mike Cariaso Fri, 02 Mar 2007 05:44:44 -0800

I think this may have already been asked, but I don't recall a response. run 
with --debug=/tmp/mylog .
Either keep an eye on a 'tail -f' on the master, scheduler, and a few of the 
workers. Or do the more tedious work of figuring out the steps that are taking 
the longest time.


I've been thinking about kicking out a little visualization of the logs, has 
anyone already done anything similar?

 
--
Mike Cariaso * Bioinformatics Software * http://cariaso.com

----- Original Message ----
From: Aaron Darling <[EMAIL PROTECTED]>
To: [email protected]
Sent: Friday, March 2, 2007 4:01:12 AM
Subject: Re: [Mpiblast-users] blast in 1 day but could not get mpiblast done 
even in 10 days for the same dataset

It sounds like there must be something causing an mpiblast-specific
communications bottleneck in your system.  Anybody else have ideas
here?  If you're keen to verify that, you could run mpiblast on the
benchmark dataset we were using on Green Destiny and compare runtimes.
My latest benchmark data set (dated June 2005) has a runtime of about 16
minutes for 64 nodes to search the 300K erwinia query set against the
first 14GB of nt using blastn.  Each compute node in that machine was a
667Mhz transmeta chip, 640MB ram, connected via 100Mbit ethernet.  I was
using mpich2-1.0.1, no SCore.  Based on paper specs, your cluster should
be quicker than that.

On the other hand, if you've got wild amounts of load imbalance,
--db-replicate-count=5 may not be enough, and 41 may prove ideal (where
41 = the number of nodes in your cluster).  In that case, mpiblast will
have effectively copied the entire database to each node, totally
factoring out load imbalance from the compute time equation.  Your
database is much smaller than each node's core memory, and a single
fragment is probably much larger than each node's CPU cache, so I can't
think of a good reason not to fully distribute the database, apart from
the time it takes to copy DB fragments around.

In any case, keep me posted if you discover anything.

-Aaron


intikhab alam wrote:
> Hi Aaron,
>
> As per your suggestion, I used the following option:
>
> --db-replicate-count=5
>
> assuming it may help reach the 24hrs mark to complete the job. 
> However, I see that only 6% of the (total estimated) output has been 
> generated until now(i.e after 4 days (4*24 hrs). If I continue this 
> way, my mpiblast would finish in 64 days. Any other suggestion to 
> improve the running time?
>
> Intikhab
> ----- Original Message ----- 
> From: "Aaron Darling" <[EMAIL PROTECTED]>
> To: "intikhab alam" <[EMAIL PROTECTED]>; 
> <[email protected]>
> Sent: Wednesday, February 21, 2007 1:33 AM
> Subject: Re: [Mpiblast-users] blast in 1 day but could not get 
> mpiblast done even in 10 days for the same dataset
>
>
> : Hi Intikhab...
> :
> : intikhab alam wrote:
> : > : can take a long time to compute the effective search space 
> required
> : > for
> : > : exact e-value calculation.  If that's the problem, then you 
> would
> : > find
> : > : just one mpiblast process consuming 100% cpu on the rank 0 node 
> for
> : > : hours or days, without any output.
> : >
> : > Is the effective search space calculation done on the master node? 
> If
> : > yes, this mpiblast job stayed at the master node for some hours 
> and
> : > then all the compute nodes got busy with >90% usage all the time 
> with
> : > continued output being generated until the 12th day when I killed 
> the
> : > job.
> : >
> :
> : yes, the search space calculation is done on the master node and it
> : sounds like using the --fast-evalue-approximation command-line 
> switch
> : would save you a few hours, which is pretty small compared to the 
> weeks
> : or months that the rest of the search is taking.
> :
> : > :
> : > : The more likely limiting factor is load imbalance on the 
> cluster.
> : >
> : >
> : > In this case, do you think the job should finish on some nodes 
> earliar
> : > than others? In my case job was running on all the nodes with >90%
> : > usage and the last output I got was on the last day when I killed 
> the
> : > job.
> : >
> : It's possible the other nodes may continue running mpiblast workers
> : which are waiting to send results back to the mpiblast writer 
> process.
> :
> : > : If some database fragments happen to have a large number of hits 
> and
> : > : others have few, and the database is distributed as one fragment 
> per
> : > : node, then the computation may be heavily imbalanced and may run
> : > quite
> : > : slowly.  CPU consumption as given by a CPU monitoring tool may 
> not
> : > be
> : > : indicative of useful work being done on the nodes since workers 
> can
> : > do a
> : > : timed spin-wait for new work.
> : > : I can suggest two avenues to achieve better load balance with
> : > mpiblast
> : > : 1.4.0.  First, partition the database into more fragments, 
> possibly
> : > two
> : > : or three times as many as you currently have.  Second, use the
> : >
> : > You mean more fragments that inturn means to use more nodes? 
> Actually
> : > at our cluster not more than 44 nodes are allowed for the parallel
> : > jobs.
> : >
> : no, it's not necessary to run on more nodes when creating more
> : fragments.  mpiblast 1.4.0 needs at least as many fragments as nodes
> : when --db-replicate-count=1 (the default value).
> : when there are more fragments than nodes, mpiblast will happily
> : distribute the extra fragments among the nodes.
> :
> : > : --db-replicate-count option to mpiblast.  The default value for 
> the
> : > : db-replicate-count is 1, which indicates that mpiblast will
> : > distribute a
> : > : single copy of your database across worker nodes.  For your 
> setup,
> : > each
> : > : node was probably getting a single fragment.  By setting
> : >
> : >
> : > Is it not right if each single node gets a single fragment of the
> : > target database (the number of nodes assigned for mpiblast = 
> number of
> : > fragments+2) so that the whole query dataset could be searched 
> against
> : > the fragment (effective search space calculation being done before
> : > starting the search for blast comparable evalues) on each single 
> node?
> : >
> : the search space calculation happens on the rank 0 process and 
> totally
> : unrelated to the number of nodes and number of DB fragments.  The 
> most
> : basic mpiblast setup has one fragment per node, but when 
> load-balancing
> : is desirable, as in your case, mpiblast can be configured to use
> : multiple fragments per node.  This will not affect the e-value 
> calculation.
> :
> : >
> : > : --db-replicate-count to something like 5, each fragment would be
> : > copied
> : > : to five different compute nodes, and thus five nodes would be
> : > available
> : > : to search fragments that happen to have lots of hits.  In the
> : > extreme
> : >
> : > You mean this way nodes would be busy searching the query dataset
> : > against the same fragment on 5 compute nodes? Is this just a way 
> to
> : > keep the nodes busy until all the nodes complete the searches?
> : >
> : Yes, this will balance the load and will probably speed up your 
> search.
> :
> : > : case you could set --db-replicate-count equal to the number of
> : > : fragments, which would be fine if per-node memory and disk space 
> is
> : > : substantially larger than the total size of the formatted 
> database.
> : > :
> : >
> : > Is it possible in mpiblast that for cases where the size of the 
> query
> : > dataset is equal to the size of target dataset, the query dataset
> : > should be fragmented, the target dataset should be kept in the
> : > global/shared area and searches are done on single nodes (the 
> number
> : > of nodes equal to the number of query dataset fragments) and this 
> way
> : > there would be no need to calculate the effective search space as 
> all
> : > the search jobs get the same size of the target dataset? by 
> following
> : > this way I managed to complete this job using standard blast in <
> : > 24hrs.
> : >
> : The parallelization approach you describe is perfectly reasonable 
> when
> : the total database size is less than core memory size on each node.
> : With a properly configured --db-replicate-count, I would guess that
> : mpiblast could approach the 24 hour mark, although may take slightly
> : longer since there are various overheads involved with copying of
> : fragments and serial computation of the effective search space.
> :
> :
> : > :
> : > : In your particular situation, it may also help to randomize the
> : > order of
> : > : sequences in the database to minimize "fragment hotspots" which
> : > could
> : > : result from a database self-search.
> : >
> : > I did not get the "fragment hotspots" bit here. By randomizing the
> : > order of sequence you mean each node would possibly take similar 
> time
> : > to finish the searches? Otherwise it could be possible that the 
> number
> : > of hits could be lower for some fragments than others and this 
> ends up
> : > in different times for the job completion on different nodes?
> : >
> : Right, the goal is to get the per-fragment search time more balanced
> : through randomization.  But after thinking about it a bit more, i'm 
> not
> : sure just how much this would save....
> :
> : >
> : > : At the moment mpiblast doesn't have
> : > : code to accomplish such a feat, but I think others (Jason Gans?)
> : > have
> : > : written code for this in the past.
> : >
> : > Aaron, do you think Score based mpi communication may be delaying 
> the
> : > overall time in running mpiblast searches?
> : >
> : It's possible.
> : The interprocess communication in 1.4.0 was fine-tuned for default
> : mpich2 1.0.2 and lam/mpi implementations.  We use various 
> combinations
> : of the non-blocking MPI_Issend(), MPI_Irecv(), and the blocking
> : send/recv api in mpiblast 1.4.0.  I have no idea how it would 
> interact
> : with SCore.
> :
> : -Aaron
> :
> : 
>   



-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Mpiblast-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mpiblast-users

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV

_______________________________________________
Mpiblast-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mpiblast-users

Re: [Mpiblast-users] blast in 1 day but could not get mpiblast done even in 10 days for the same dataset

Reply via email to