Ravi, The parallel overhead of mpiBLAST/pio tends to increase as databases are fragmented into more fragments. The more expensive is the BLAST search, the more likely it can take the advantage of using larger number of fragments to reduce the computation time. Given the short query length and low evalue you were using, I would guess the BLAST search time of a query against 1/30 nt (32 processors in your case) is already quite small, beyond which the parallel overhead would offset the search time shrinkage.
But this does NOT necessary mean that you can't benefit from using large number of processors in your search. mpiBLAST 1.4/pio support combination of query and database segmentation, therefore you don't have to always partition the database into n-2 on n processors. In your 64-processor case, one configuration can be partitioning database into 31 fragments and using "--db-replicate-count=2" runtime option to tell the master to distribute 2 replicas of nt database to workers. One related note, increasing the number of DB replicas may involve more time spent on distributing DB fragments. By default mpiBLAST 1.4/pio allow distributing one DB fragment at a time, you can enable concurrent fragment distributing but tuning "--concurrent" option according to the I/O capability of your shared storage system. Hope this help, Heshan > -----Original Message----- > From: Ravi Vijaya Satya [Contractor, Foreign National] [mailto:[EMAIL PROTECTED] > Sent: Monday, March 12, 2007 6:07 PM > To: 'Heshan Lin'; [email protected] > Subject: RE: [Mpiblast-users] File systems that work well with mpiBLAST-pio > > Heshan, > > Thanks for the detailed reply. > > Our queries are short (35-40bp) dna sequences from nt. The database is the > entire nt database. We have around 500-1000 hits for each query, since we > run BLAST with very low thresholds for e-value. > > What we observed is that the execution times increase when we go beyond 32 > processors. This increase was some what less for mpiBLAST-pio when compared > to mpiBLAST, but it was still an increase in run times, rather than a > decrease. When using 32 or fewer processors, the run times for mpiBLAST and > mpiBLAST-pio are very similar. If we are running on n processors, we divide > the database into n-2 chunks. > > Thanks, > Ravi > > -----Original Message----- > From: Heshan Lin [mailto:[EMAIL PROTECTED] > Sent: Monday, March 12, 2007 1:47 AM > To: [EMAIL PROTECTED]; [email protected] > Subject: RE: [Mpiblast-users] File systems that work well with mpiBLAST-pio > > Hi Ravi, > > In the paper pioBLAST was compared with mpiBLAST 1.2, mpiBlAST 1.4 has > improved performance a lot since then =). Besides, mpiBLAST-pio is not > working exactly the same as pioBLAST, please refer to the following message > I posted in the mail list before for their differences. > http://sourceforge.net/mailarchive/forum.php?thread_id=9626643&forum_id=4368 > 9 > > Currently mpiBLAST-pio provides two output options. > > 1) paralle-write. This is a more efficient output strategy which requires > special support from parallel file systems to ensure the result correctness. > It has been tested on PVFS2 and SGI XFS. I don't have access to other > parallel file systems, but the parallel-write strategy should work on file > systems that support the level-2 write access (independent, non-contiguous > write) mentioned in the following MPI-IO paper: > Rajeev Thakur, William Gropp, and Ewing Lusk. "A case for using MPI's > derived datatypes to improve I/O performance". In Proceedings of SC98: High > Performance Networking and Computing, November 1998. > > 2) master-write. This output strategy is less scalable but it does not > require special support from file systems, and it is recommended on systems > with NFS. > > The performance difference between mpiBLAST 1.4 and mpiBLAST-pio depends > much on the characteristics of the query set and the database. According to > our experiences, even with the master-write output option, mpiBLAST-pio > shows significant performance improvement when searching queries against > large database with bulky output volume (e.g. searching sequences randomly > sampled from NT database against NT database itself). However, when > searching queries with small amount of output, mpiBLAST 1.4 and mpiBLAST-pio > deliver similar search throughput. > > Which database and query set were you using for the performance comparison? > > Thanks, > Heshan > > ________________________________________ > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Ravi > Vijaya Satya [Contractor, Foreign National] > Sent: Friday, March 09, 2007 12:41 PM > To: [email protected] > Subject: [Mpiblast-users] File systems that work well with mpiBLAST-pio > > I was wondering if mpiBLAST-pio requires any file system features for giving > better performance. In the Lin et al paper on pioBLAST, it was shown that > pioBLAST performs better than mpiBLAST. However, I could not see any > significance improvement in performance over mpiBLAST (1.4.0) using > mpiBLAST-pio. > > Can any one list some file systems that have parallel I/O support necessary > for mpiBLAST-pio? Is luster one such file system? > > Thanks, > Ravi > ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Mpiblast-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/mpiblast-users
