Re: [Mpiblast-users] blast in 1 day but could not get mpiblast done even in 10 days for the same dataset

Aaron Darling Tue, 20 Feb 2007 17:34:59 -0800

Hi Intikhab...

intikhab alam wrote:
> : can take a long time to compute the effective search space required 
> for
> : exact e-value calculation.  If that's the problem, then you would 
> find
> : just one mpiblast process consuming 100% cpu on the rank 0 node for
> : hours or days, without any output.
>
> Is the effective search space calculation done on the master node? If 
> yes, this mpiblast job stayed at the master node for some hours and 
> then all the compute nodes got busy with >90% usage all the time with 
> continued output being generated until the 12th day when I killed the 
> job.
>


yes, the search space calculation is done on the master node and it 
sounds like using the --fast-evalue-approximation command-line switch 
would save you a few hours, which is pretty small compared to the weeks 
or months that the rest of the search is taking.

> :
> : The more likely limiting factor is load imbalance on the cluster.
>
>
> In this case, do you think the job should finish on some nodes earliar 
> than others? In my case job was running on all the nodes with >90% 
> usage and the last output I got was on the last day when I killed the 
> job.
>   
It's possible the other nodes may continue running mpiblast workers 
which are waiting to send results back to the mpiblast writer process.

> : If some database fragments happen to have a large number of hits and
> : others have few, and the database is distributed as one fragment per
> : node, then the computation may be heavily imbalanced and may run 
> quite
> : slowly.  CPU consumption as given by a CPU monitoring tool may not 
> be
> : indicative of useful work being done on the nodes since workers can 
> do a
> : timed spin-wait for new work.
> : I can suggest two avenues to achieve better load balance with 
> mpiblast
> : 1.4.0.  First, partition the database into more fragments, possibly 
> two
> : or three times as many as you currently have.  Second, use the
>
> You mean more fragments that inturn means to use more nodes? Actually 
> at our cluster not more than 44 nodes are allowed for the parallel 
> jobs.
>   
no, it's not necessary to run on more nodes when creating more 
fragments.  mpiblast 1.4.0 needs at least as many fragments as nodes 
when --db-replicate-count=1 (the default value).
when there are more fragments than nodes, mpiblast will happily 
distribute the extra fragments among the nodes. 

> : --db-replicate-count option to mpiblast.  The default value for the
> : db-replicate-count is 1, which indicates that mpiblast will 
> distribute a
> : single copy of your database across worker nodes.  For your setup, 
> each
> : node was probably getting a single fragment.  By setting
>
>
> Is it not right if each single node gets a single fragment of the 
> target database (the number of nodes assigned for mpiblast = number of 
> fragments+2) so that the whole query dataset could be searched against 
> the fragment (effective search space calculation being done before 
> starting the search for blast comparable evalues) on each single node?
>   
the search space calculation happens on the rank 0 process and totally 
unrelated to the number of nodes and number of DB fragments.  The most 
basic mpiblast setup has one fragment per node, but when load-balancing 
is desirable, as in your case, mpiblast can be configured to use 
multiple fragments per node.  This will not affect the e-value calculation.

>
> : --db-replicate-count to something like 5, each fragment would be 
> copied
> : to five different compute nodes, and thus five nodes would be 
> available
> : to search fragments that happen to have lots of hits.  In the 
> extreme
>
> You mean this way nodes would be busy searching the query dataset 
> against the same fragment on 5 compute nodes? Is this just a way to 
> keep the nodes busy until all the nodes complete the searches?
>   
Yes, this will balance the load and will probably speed up your search.

> : case you could set --db-replicate-count equal to the number of
> : fragments, which would be fine if per-node memory and disk space is
> : substantially larger than the total size of the formatted database.
> :
>
> Is it possible in mpiblast that for cases where the size of the query 
> dataset is equal to the size of target dataset, the query dataset 
> should be fragmented, the target dataset should be kept in the 
> global/shared area and searches are done on single nodes (the number 
> of nodes equal to the number of query dataset fragments) and this way 
> there would be no need to calculate the effective search space as all 
> the search jobs get the same size of the target dataset? by following 
> this way I managed to complete this job using standard blast in < 
> 24hrs.
>   
The parallelization approach you describe is perfectly reasonable when 
the total database size is less than core memory size on each node.  
With a properly configured --db-replicate-count, I would guess that 
mpiblast could approach the 24 hour mark, although may take slightly 
longer since there are various overheads involved with copying of 
fragments and serial computation of the effective search space.


> :
> : In your particular situation, it may also help to randomize the 
> order of
> : sequences in the database to minimize "fragment hotspots" which 
> could
> : result from a database self-search.
>
> I did not get the "fragment hotspots" bit here. By randomizing the 
> order of sequence you mean each node would possibly take similar time 
> to finish the searches? Otherwise it could be possible that the number 
> of hits could be lower for some fragments than others and this ends up 
> in different times for the job completion on different nodes?
>   
Right, the goal is to get the per-fragment search time more balanced 
through randomization.  But after thinking about it a bit more, i'm not 
sure just how much this would save....

>
> : At the moment mpiblast doesn't have
> : code to accomplish such a feat, but I think others (Jason Gans?) 
> have
> : written code for this in the past.
>
> Aaron, do you think Score based mpi communication may be delaying the 
> overall time in running mpiblast searches?
>   
It's possible.
The interprocess communication in 1.4.0 was fine-tuned for default 
mpich2 1.0.2 and lam/mpi implementations.  We use various combinations 
of the non-blocking MPI_Issend(), MPI_Irecv(), and the blocking 
send/recv api in mpiblast 1.4.0.  I have no idea how it would interact 
with SCore.

-Aaron


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Mpiblast-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mpiblast-users

Re: [Mpiblast-users] blast in 1 day but could not get mpiblast done even in 10 days for the same dataset

Reply via email to