Re: [Mpiblast-users] blast in 1 day but could not get mpiblast done even in 10 days for the same dataset

ialam Thu, 08 Mar 2007 05:28:37 -0800

Hi Aron,

I wanted to have the benchmark dataset so that I could test mpiblast 
performance. Could you please point me to the dataset. In the meantime I am 
trying to get mpich running on the cluster.



Many Thanks,

Intikhab
----- Original Message ----- 
From: "intikhab alam" <[EMAIL PROTECTED]>
To: "Aaron Darling" <[EMAIL PROTECTED]>
Sent: Friday, March 02, 2007 1:20 PM
Subject: Re: [Mpiblast-users] blast in 1 day but could not get mpiblast done 
even in 10 days for the same dataset


> Hi Aron,
>
> I would like to try out the benchmark dataset, could you point me from
> where I could download this?
>
>
> Intikhab
> ----- Original Message ----- 
> From: "Aaron Darling" <[EMAIL PROTECTED]>
> To: "intikhab alam" <[EMAIL PROTECTED]>
> Sent: Friday, March 02, 2007 6:21 AM
> Subject: Re: [Mpiblast-users] blast in 1 day but could not get
> mpiblast done even in 10 days for the same dataset
>
>
> : It sounds like there must be something causing an mpiblast-specific
> : communications bottleneck in your system.  Anybody else have ideas
> : here?  If you're keen to verify that, you could run mpiblast on the
> : benchmark dataset we were using on Green Destiny and compare
> runtimes.
> : My latest benchmark data set (dated June 2005) has a runtime of
> about 16
> : minutes for 64 nodes to search the 300K erwinia query set against
> the
> : first 14GB of nt using blastn.  Each compute node in that machine
> was a
> : 667Mhz transmeta chip, 640MB ram, connected via 100Mbit ethernet.  I
> was
> : using mpich2-1.0.1, no SCore.  Based on paper specs, your cluster
> should
> : be quicker than that.
> :
> : On the other hand, if you've got wild amounts of load imbalance,
> : --db-replicate-count=5 may not be enough, and 41 may prove ideal
> (where
> : 41 = the number of nodes in your cluster).  In that case, mpiblast
> will
> : have effectively copied the entire database to each node, totally
> : factoring out load imbalance from the compute time equation.  Your
> : database is much smaller than each node's core memory, and a single
> : fragment is probably much larger than each node's CPU cache, so I
> can't
> : think of a good reason not to fully distribute the database, apart
> from
> : the time it takes to copy DB fragments around.
> :
> : In any case, keep me posted if you discover anything.
> :
> : -Aaron
> :
> :
> : intikhab alam wrote:
> : > Hi Aaron,
> : >
> : > As per your suggestion, I used the following option:
> : >
> : > --db-replicate-count=5
> : >
> : > assuming it may help reach the 24hrs mark to complete the job.
> : > However, I see that only 6% of the (total estimated) output has
> been
> : > generated until now(i.e after 4 days (4*24 hrs). If I continue
> this
> : > way, my mpiblast would finish in 64 days. Any other suggestion to
> : > improve the running time?
> : >
> : > Intikhab
> : > ----- Original Message ----- 
> : > From: "Aaron Darling" <[EMAIL PROTECTED]>
> : > To: "intikhab alam" <[EMAIL PROTECTED]>;
> : > <[email protected]>
> : > Sent: Wednesday, February 21, 2007 1:33 AM
> : > Subject: Re: [Mpiblast-users] blast in 1 day but could not get
> : > mpiblast done even in 10 days for the same dataset
> : >
> : >
> : > : Hi Intikhab...
> : > :
> : > : intikhab alam wrote:
> : > : > : can take a long time to compute the effective search space
> : > required
> : > : > for
> : > : > : exact e-value calculation.  If that's the problem, then you
> : > would
> : > : > find
> : > : > : just one mpiblast process consuming 100% cpu on the rank 0
> node
> : > for
> : > : > : hours or days, without any output.
> : > : >
> : > : > Is the effective search space calculation done on the master
> node?
> : > If
> : > : > yes, this mpiblast job stayed at the master node for some
> hours
> : > and
> : > : > then all the compute nodes got busy with >90% usage all the
> time
> : > with
> : > : > continued output being generated until the 12th day when I
> killed
> : > the
> : > : > job.
> : > : >
> : > :
> : > : yes, the search space calculation is done on the master node and
> it
> : > : sounds like using the --fast-evalue-approximation command-line
> : > switch
> : > : would save you a few hours, which is pretty small compared to
> the
> : > weeks
> : > : or months that the rest of the search is taking.
> : > :
> : > : > :
> : > : > : The more likely limiting factor is load imbalance on the
> : > cluster.
> : > : >
> : > : >
> : > : > In this case, do you think the job should finish on some nodes
> : > earliar
> : > : > than others? In my case job was running on all the nodes with
> >90%
> : > : > usage and the last output I got was on the last day when I
> killed
> : > the
> : > : > job.
> : > : >
> : > : It's possible the other nodes may continue running mpiblast
> workers
> : > : which are waiting to send results back to the mpiblast writer
> : > process.
> : > :
> : > : > : If some database fragments happen to have a large number of
> hits
> : > and
> : > : > : others have few, and the database is distributed as one
> fragment
> : > per
> : > : > : node, then the computation may be heavily imbalanced and may
> run
> : > : > quite
> : > : > : slowly.  CPU consumption as given by a CPU monitoring tool
> may
> : > not
> : > : > be
> : > : > : indicative of useful work being done on the nodes since
> workers
> : > can
> : > : > do a
> : > : > : timed spin-wait for new work.
> : > : > : I can suggest two avenues to achieve better load balance
> with
> : > : > mpiblast
> : > : > : 1.4.0.  First, partition the database into more fragments,
> : > possibly
> : > : > two
> : > : > : or three times as many as you currently have.  Second, use
> the
> : > : >
> : > : > You mean more fragments that inturn means to use more nodes?
> : > Actually
> : > : > at our cluster not more than 44 nodes are allowed for the
> parallel
> : > : > jobs.
> : > : >
> : > : no, it's not necessary to run on more nodes when creating more
> : > : fragments.  mpiblast 1.4.0 needs at least as many fragments as
> nodes
> : > : when --db-replicate-count=1 (the default value).
> : > : when there are more fragments than nodes, mpiblast will happily
> : > : distribute the extra fragments among the nodes.
> : > :
> : > : > : --db-replicate-count option to mpiblast.  The default value
> for
> : > the
> : > : > : db-replicate-count is 1, which indicates that mpiblast will
> : > : > distribute a
> : > : > : single copy of your database across worker nodes.  For your
> : > setup,
> : > : > each
> : > : > : node was probably getting a single fragment.  By setting
> : > : >
> : > : >
> : > : > Is it not right if each single node gets a single fragment of
> the
> : > : > target database (the number of nodes assigned for mpiblast =
> : > number of
> : > : > fragments+2) so that the whole query dataset could be searched
> : > against
> : > : > the fragment (effective search space calculation being done
> before
> : > : > starting the search for blast comparable evalues) on each
> single
> : > node?
> : > : >
> : > : the search space calculation happens on the rank 0 process and
> : > totally
> : > : unrelated to the number of nodes and number of DB fragments.
> The
> : > most
> : > : basic mpiblast setup has one fragment per node, but when
> : > load-balancing
> : > : is desirable, as in your case, mpiblast can be configured to use
> : > : multiple fragments per node.  This will not affect the e-value
> : > calculation.
> : > :
> : > : >
> : > : > : --db-replicate-count to something like 5, each fragment
> would be
> : > : > copied
> : > : > : to five different compute nodes, and thus five nodes would
> be
> : > : > available
> : > : > : to search fragments that happen to have lots of hits.  In
> the
> : > : > extreme
> : > : >
> : > : > You mean this way nodes would be busy searching the query
> dataset
> : > : > against the same fragment on 5 compute nodes? Is this just a
> way
> : > to
> : > : > keep the nodes busy until all the nodes complete the searches?
> : > : >
> : > : Yes, this will balance the load and will probably speed up your
> : > search.
> : > :
> : > : > : case you could set --db-replicate-count equal to the number
> of
> : > : > : fragments, which would be fine if per-node memory and disk
> space
> : > is
> : > : > : substantially larger than the total size of the formatted
> : > database.
> : > : > :
> : > : >
> : > : > Is it possible in mpiblast that for cases where the size of
> the
> : > query
> : > : > dataset is equal to the size of target dataset, the query
> dataset
> : > : > should be fragmented, the target dataset should be kept in the
> : > : > global/shared area and searches are done on single nodes (the
> : > number
> : > : > of nodes equal to the number of query dataset fragments) and
> this
> : > way
> : > : > there would be no need to calculate the effective search space
> as
> : > all
> : > : > the search jobs get the same size of the target dataset? by
> : > following
> : > : > this way I managed to complete this job using standard blast
> in <
> : > : > 24hrs.
> : > : >
> : > : The parallelization approach you describe is perfectly
> reasonable
> : > when
> : > : the total database size is less than core memory size on each
> node.
> : > : With a properly configured --db-replicate-count, I would guess
> that
> : > : mpiblast could approach the 24 hour mark, although may take
> slightly
> : > : longer since there are various overheads involved with copying
> of
> : > : fragments and serial computation of the effective search space.
> : > :
> : > :
> : > : > :
> : > : > : In your particular situation, it may also help to randomize
> the
> : > : > order of
> : > : > : sequences in the database to minimize "fragment hotspots"
> which
> : > : > could
> : > : > : result from a database self-search.
> : > : >
> : > : > I did not get the "fragment hotspots" bit here. By randomizing
> the
> : > : > order of sequence you mean each node would possibly take
> similar
> : > time
> : > : > to finish the searches? Otherwise it could be possible that
> the
> : > number
> : > : > of hits could be lower for some fragments than others and this
> : > ends up
> : > : > in different times for the job completion on different nodes?
> : > : >
> : > : Right, the goal is to get the per-fragment search time more
> balanced
> : > : through randomization.  But after thinking about it a bit more,
> i'm
> : > not
> : > : sure just how much this would save....
> : > :
> : > : >
> : > : > : At the moment mpiblast doesn't have
> : > : > : code to accomplish such a feat, but I think others (Jason
> Gans?)
> : > : > have
> : > : > : written code for this in the past.
> : > : >
> : > : > Aaron, do you think Score based mpi communication may be
> delaying
> : > the
> : > : > overall time in running mpiblast searches?
> : > : >
> : > : It's possible.
> : > : The interprocess communication in 1.4.0 was fine-tuned for
> default
> : > : mpich2 1.0.2 and lam/mpi implementations.  We use various
> : > combinations
> : > : of the non-blocking MPI_Issend(), MPI_Irecv(), and the blocking
> : > : send/recv api in mpiblast 1.4.0.  I have no idea how it would
> : > interact
> : > : with SCore.
> : > :
> : > : -Aaron
> : > :
> : > :
> : >
> :
> :
> 


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Mpiblast-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mpiblast-users

Re: [Mpiblast-users] blast in 1 day but could not get mpiblast done even in 10 days for the same dataset

Reply via email to