Re: [Mpiblast-users] File systems that work well with mpiBLAST-pio

Heshan Lin Tue, 13 Mar 2007 14:51:16 -0800

Ravi,

The parallel overhead of mpiBLAST/pio tends to increase as databases are
fragmented into more fragments. The more expensive is the BLAST search, the
more likely it can take the advantage of using larger number of fragments to
reduce the computation time. Given the short query length and low evalue you
were using, I would guess the BLAST search time of a query against 1/30 nt
(32 processors in your case) is already quite small, beyond which the
parallel overhead would offset the search time shrinkage.


But this does NOT necessary mean that you can't benefit from using large
number of processors in your search. mpiBLAST 1.4/pio support combination of
query and database segmentation, therefore you don't have to always
partition the database into n-2 on n processors. In your 64-processor case,
one configuration can be partitioning database into 31 fragments and using
"--db-replicate-count=2" runtime option to tell the master to distribute 2
replicas of nt database to workers. 

One related note, increasing the number of DB replicas may involve more time
spent on distributing DB fragments. By default mpiBLAST 1.4/pio allow
distributing one DB fragment at a time, you can enable concurrent fragment
distributing but tuning "--concurrent" option according to the I/O
capability of your shared storage system.

Hope this help,
Heshan

> -----Original Message-----
> From: Ravi Vijaya Satya [Contractor, Foreign National]
[mailto:[EMAIL PROTECTED]
> Sent: Monday, March 12, 2007 6:07 PM
> To: 'Heshan Lin'; [email protected]
> Subject: RE: [Mpiblast-users] File systems that work well with
mpiBLAST-pio
> 
> Heshan,
> 
> Thanks for the detailed reply.
> 
> Our queries are short (35-40bp) dna sequences from nt. The database is the
> entire nt database. We have around 500-1000 hits for each query, since we
> run BLAST with very low thresholds for e-value.
> 
> What we observed is that the execution times increase when we go beyond 32
> processors.  This increase was some what less for mpiBLAST-pio when
compared
> to mpiBLAST, but it was still an increase in run times, rather than a
> decrease.  When using 32 or fewer processors, the run times for mpiBLAST
and
> mpiBLAST-pio are very similar. If we are running on n processors, we
divide
> the database into n-2 chunks.
> 
> Thanks,
> Ravi
> 
> -----Original Message-----
> From: Heshan Lin [mailto:[EMAIL PROTECTED]
> Sent: Monday, March 12, 2007 1:47 AM
> To: [EMAIL PROTECTED]; [email protected]
> Subject: RE: [Mpiblast-users] File systems that work well with
mpiBLAST-pio
> 
> Hi Ravi,
> 
> In the paper pioBLAST was compared with mpiBLAST 1.2, mpiBlAST 1.4 has
> improved performance a lot since then =). Besides, mpiBLAST-pio is not
> working exactly the same as pioBLAST, please refer to the following
message
> I posted in the mail list before for their differences.
>
http://sourceforge.net/mailarchive/forum.php?thread_id=9626643&forum_id=4368
> 9
> 
> Currently mpiBLAST-pio provides two output options.
> 
> 1) paralle-write. This is a more efficient output strategy which requires
> special support from parallel file systems to ensure the result
correctness.
> It has been tested on PVFS2 and SGI XFS. I don't have access to other
> parallel file systems, but the parallel-write strategy should work on file
> systems that support the level-2 write access (independent, non-contiguous
> write) mentioned in the following MPI-IO paper:
> Rajeev Thakur, William Gropp, and Ewing Lusk. "A case for using MPI's
> derived datatypes to improve I/O performance". In Proceedings of SC98:
High
> Performance Networking and Computing, November 1998.
> 
> 2) master-write. This output strategy is less scalable but it does not
> require special support from file systems, and it is recommended on
systems
> with NFS.
> 
> The performance difference between mpiBLAST 1.4 and mpiBLAST-pio depends
> much on the characteristics of the query set and the database. According
to
> our experiences, even with the master-write output option, mpiBLAST-pio
> shows significant performance improvement when searching queries against
> large database with bulky output volume (e.g. searching sequences randomly
> sampled from NT database against NT database itself). However, when
> searching queries with small amount of output, mpiBLAST 1.4 and
mpiBLAST-pio
> deliver similar search throughput.
> 
> Which database and query set were you using for the performance
comparison?
> 
> Thanks,
> Heshan
> 
> ________________________________________
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Ravi
> Vijaya Satya [Contractor, Foreign National]
> Sent: Friday, March 09, 2007 12:41 PM
> To: [email protected]
> Subject: [Mpiblast-users] File systems that work well with mpiBLAST-pio
> 
> I was wondering if mpiBLAST-pio requires any file system features for
giving
> better performance. In the Lin et al paper on pioBLAST, it was shown that
> pioBLAST performs better than mpiBLAST. However, I could not see any
> significance improvement in performance over mpiBLAST (1.4.0) using
> mpiBLAST-pio.
> 
> Can any one list some file systems that have parallel I/O support
necessary
> for mpiBLAST-pio? Is luster one such file system?
> 
> Thanks,
> Ravi
> 



-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Mpiblast-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mpiblast-users

Re: [Mpiblast-users] File systems that work well with mpiBLAST-pio

Reply via email to