Sorry for the cross-posting, my earliar email to mpiblast mailing list 
got returned perhaps due to attachments.


Hi Aaron,
Hi All,

I have been struggling to run mpiblast for the dataset of proteins I
have from 36 fungal genomes. I wanted to compare all proteins to each
other, this is usually called 'all-against-all blast'. Simple blastp
on this dataset, chopped into 36 fragments, takes around 24 hours on
our cluster which is running Score (version 5.8.4.r3) as an mpi
environment and have 25 nodes, each with 4 cores and 8 GB of RAM.

On the same cluster when I run mpiblast on 38 fragments using 40
nodes, even after 12 days I only get 22% of the total estimated output
(I know the complete output from earlier standard blast run).

Aaron suggested to test the following and seems It still did not solve
the problem. I tested the following suggestions from Aaron:

1) --db-replicate-count=5 option so that mpiblast will distribute a
five copies of the database across worker nodes and this may enhance
the performance.

With this option, I checked the results after 4 days and I merely get
6% of the output.

2) mpiblast is optimized for mpich or lam-mpi so it may be useful to
test mpiblast with these mpi implementations

we have got gnu mpich installed and run mpiblast for our dataset
with --debug as below:

>mpiformatdb -i 36FungalJGIanigNbcin_M40 --nfrags=38 -p
T --skip-reorder

>/usr/local/sge6.0/bin/lx24-amd64/qsub -pe mpich 10
/usr/local/mpich-GNU64/bin/mpirun -arch LINUX -machinefile
/usr/local/mpich-GNU64/share/machines.LINUX -np 40 -nodes
10 -nolocal -allcpus -v
/software/man2/manchester/mpiblast/tool/bin/mpiblast --debug -p
blastp -d 36FungalJGIanigNbcin_M40 -i
/scratch/man2/nwmcixa/mpiblast/work/36FungalJGIanigNbcin_M40 -m 8 -e
1e-5 -o
/scratch/man2/nwmcixa/mpiblast/work/36FungalJGIanigNbcin_M40_mpi.out


It did not produce any output, STDOUT, as --debug options was used, is
shown at:
http://www.bioinf.manchester.ac.uk/~alam/xxxxxx/mpiblast/mpirun_36genomes.e29468



3) 2) benchmark dataset (300K erwinia query set against the first 14GB
of nt using blastn)

mpiblast was run on the benchmark dataset and It produced 1.07Mb of 
output (shown at 
http://www.bioinf.manchester.ac.uk/~alam/xxxxxx/mpiblast/echry_In_nt14G_m8.out 
) 
within 14 minutes, the STDOUT is shown at:
http://www.bioinf.manchester.ac.uk/~alam/xxxxxx/mpiblast/mpirun_benchmark.e29467


I am not sure whether this run produced the complete output as
compared to the standard mpiblast test on the benchmark dataset by
Aaron Darling.


I am not sure what is the cause of slow mpiblast performance on my
dataset.

Is it possible to produce a version of mpiblast that splits the query
dataset instead of the search|target dataset when the sizes of the
query and target dataset are the same? This way there will be no need
to correct the e-value calculation and the output of each fragmented
query against the complete dataset can be concatenated to get the
complete results. Running standard blast in this way produces the
output in around 24 hours.

I hope to get some help to resolve mpiblast on the dataset I have.

Many Thanks,

Intikhab



--
Dr. Intikhab Alam
Research Associate
School of Computer Science
University of Manchester
LF1, Kilburn Building,
Oxford Road Manchester, M13 9PL
United Kingdom
http://www.cs.man.ac.uk/~ialam





-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Mpiblast-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mpiblast-users

Reply via email to