Hi, I have a query file with 359,001 ESTs (218MB). I'm running a blastx using
mpiBLAST against the Swiss-Prot Uniprot database containing 241,242 protein
sequences (119MB) in size. The job on the master node has been doing all the
work. The worker nodes just sit idle and after more than 12 hours I still have
no results and the workers nodes are still doing nothing.
Here's my command-line entry:
/usr/local/mpich/bin/mpirun -np 12 -machinefile
/home/userx/test/mpiblast/machines /usr/local/mpiblast/bin/mpiblast -p blastx
-i /local/scratch/rosaceae2006_6_14.trim.lib -d uniprot_sprot.fasta -e 1e-5 -F
F -o /home/userx/test/mpiblast/results.out --debug=/test/mpiblast/debug.out
I'm running this job on a 60 node cluster with each node having dual 64bit AMD
Opteron 2.2Ghz processors and 2GB of RAM. The database files, mpiblast
binaries and output files all reside on NFS mounted filesystems.
Is mpiblast really just in the "preparation" phase before the workers get busy?
Can anyone tell me if I am just being impatient by thinking there's something
wrong? Will I get results if I just let it continue to run... or does
something look amiss?
Thanks,
Stephen
Below is the output from the debug logs and also from strace...
>From process 0:
[0] 0.125637 Locking fragment list
[0] 0.125722 Locked fragment list
[0] 0.14224 broadcasting file size of 218960601
[0] 0.142269 file size broadcasted
[0] 0.757526 broadcasting file
[0] 83.6573 file broadcasted
[0] 83.679 initializing ncbi ...blastall -p blastx -i
/local/scratch/rosaceae2006_6_14.trim.lib -e 1e-5 -F F -o
/home/userx/test/mpiblast/results.out -d
/share/dblibs/mpiblast/uniprot_sprot.fasta
[0] 83.6794
(0) done initializing ncbi.
[0] 83.7389 Init blast error code 0
>From process 1:
[1] 0.181678 Temp name base:
/local/scratch/rosaceae2006_6_14.trim.libXXXXXX
[1] 0.181869 Got temp name:
/local/scratch/rosaceae2006_6_14.trim.libAIhXHa
[1] 0.181895 waiting for file size broadcast
[1] 0.181913 received file size broadcast of 218960601
[1] 0.181945 opening receive file
/local/scratch/rosaceae2006_6_14.trim.libAIhXHa
[1] 0.181991 receiving file to
/local/scratch/rosaceae2006_6_14.trim.libAIhXHa
[1] 83.7265 received file broadcast
[1] 254.948 Query file received as /local/scratch/rosaceae2006_6_14.trim.lib
When the process first begins it appears to be reading in the query file. This
takes approximately 1 1/2 hours. Here's a portion of the strace:
read(4, "CCAACTGTAACTTAACCGGGAGAGGTCCCGCC"..., 4096) = 4096
read(4, "ACCGGTGGAGTGAAGAAGCCCCACCGTTTCAG"..., 4096) = 4096
read(4, "\nCATCCACGACTTTTGTTCCGACATGGCTCTC"..., 4096) = 4096
mmap(NULL, 2461696, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0x2aafb5e000
munmap(0x2ab000e000, 2461696) = 0
read(4, "GGATTAGATGGTAAGAATAACTGGAGAGTTGA"..., 4096) = 4096
mmap(NULL, 2461696, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0x2aafdb7000
munmap(0x2ab0267000, 2461696) = 0
read(4, "AATCCTGATGTGAACAAGAAGCTAAGTGCTGC"..., 4096) = 4096
read(4, "TAGTGGGCTTGCCGCACTGCTCAAGGCGGCCC"..., 4096) = 4096
brk(0x1e7f6000) = 0x1e7f6000
brk(0x1e7f4000) = 0x1e7f4000
read(4, "CCGTCTGGATCTCCCGCGAAGT\nGATAGTCGG"..., 4096) = 4096
read(4, "TCCAATCTGTTCCAGCTTCCATAAGACCGTGG"..., 4096) = 4096
read(4, "AAACTNCTGAGTGTCGACTCCCTTT\n>Malus"..., 4096) = 4096
read(4, "GCCTTGACTACCTTG\nGCAACCCAAACCTTAT"..., 4096) = 4096
read(4, "TCTTGAGAGCAGGAGCTGCCAAG\nGCCCTTGG"..., 4096) = 4096
read(4, "TCTTCCCTGTTCCTCCATTTCCGAGCTCCAAA"..., 4096) = 4096
read(4, "GACCAAAC\nTCGGCCGCCTCGTGAAGGAAGGC"..., 4096) = 4096
mmap(NULL, 2461696, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0x2ab0010000
munmap(0x2aafb5e000, 2461696) = 0
brk(0x1e815000) = 0x1e815000
read(4, "C\nAGGCTCAGTCCGGAACCGGGAAAACAGCAA"..., 4096) = 4096
read(4, "\nCGTAGTAGCATAACTACTACGAATTTC\n>Ma"..., 4096) = 4096
read(4, "CCTGNGAAAGTCTTGCTGAACCAATACAG\nAC"..., 4096) = 4096
read(4, "ACAAGAACTTG\nTGCCCAGGATGAGGTTTTAA"..., 4096) = 4096
read(4, "AAAGAACCCTATTTAGGGAGTGCAATAGAGA\n"..., 4096) = 4096
read(4, "GCGGCCCACCACGGCGTCGTCACAAGCGACTG"..., 4096) = 4096
read(4, "ACCATGGGCAATGATTTGTGGTATGGACCGGA"..., 4096) = 4096
read(4, "AAAGATGTCATTTTCATGATGATGATATTGCC"..., 4096) = 4096
brk(0x1e836000) = 0x1e836000
brk(0x1e835000) = 0x1e835000
mmap(NULL, 2461696, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0x2aafb5e000
munmap(0x2ab0010000, 2461696) = 0
read(4, "ATCTTGCCATATGATTATGAAAAAAATGAAGT"..., 4096) = 4096
mmap(NULL, 2461696, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0x2ab0010000
munmap(0x2aafdb7000, 2461696) = 0
read(4, "AAGATGTCGAGTGCCGCCGATCCCGAGCACAG"..., 4096) = 4096
read(4, "GCTCGTGGACAGCAGGGTTCTTATCCTATCAA"..., 4096) = 4096
read(4, "TAGGTGAGAATCCTTCTCTGGATTGGTCCAAC"..., 4096) = 4096
read(4, "GAAAGTTTTAATATTTTGAATTTAAATTTG\nA"..., 4096) = 4096
read(4, "TTCTCGGCACAATATCTTGCATGCGATAAATA"..., 4096) = 4096
brk(0x1e856000) = 0x1e856000
read(4, "TTGCCTCACCATGTGCACAAGCTGTGTCGCAT"..., 4096) = 4096
Later, it appears that the program is doing something with the databases. It
never seems to come out of this phase, just looping doing the same type thing
over and over again:
mmap(NULL, 266240, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0x2ab9fc3000
mmap(NULL, 266240, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0x2aba004000
stat("BLOSUM62", 0x7fbfffd0c0) = -1 ENOENT (No such file or directory)
stat("/usr/local/blast/data/BLOSUM62", {st_mode=S_IFREG|0644, st_size=2061,
...}) = 0
open("/usr/local/blast/data/BLOSUM62", O_RDONLY) = 6
fstat(6, {st_mode=S_IFREG|0644, st_size=2061, ...}) = 0
mmap(NULL, 32768, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0x2aba045000
read(6, "# Entries for the BLOSUM62 matri"..., 32768) = 2061
read(6, "", 32768) = 0
close(6) = 0
munmap(0x2aba045000, 32768) = 0
munmap(0x2aba004000, 266240) = 0
mmap(NULL, 528384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0x2aba004000
munmap(0x2ab9fc3000, 266240) = 0
munmap(0x2aba004000, 528384) = 0
munmap(0x2ab9e29000, 166504) = 0
munmap(0x2ab9e52000, 166504) = 0
munmap(0x2ab9e7b000, 166504) = 0
munmap(0x2ab9ea4000, 166504) = 0
munmap(0x2ab9ecd000, 166512) = 0
munmap(0x2ab9ef6000, 166512) = 0
munmap(0x2ab9f1f000, 166512) = 0
munmap(0x2ab9f48000, 166512) = 0
munmap(0x2ab9f71000, 166512) = 0
munmap(0x2ab9f9a000, 166504) = 0
stat("/share/dblibs/mpiblast/uniprot_sprot.fasta.pal", {st_mode=S_IFREG|0644,
st_size=604, ...}) = 0
open("/share/dblibs/mpiblast/uniprot_sprot.fasta.pal", O_RDONLY) = 6
open("/share/dblibs/mpiblast/uniprot_sprot.fasta.pal", O_RDONLY) = 7
fstat(7, {st_mode=S_IFREG|0644, st_size=604, ...}) = 0
mmap(NULL, 32768, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0x2ab9e29000
read(7, "#\n# Alias file created Wed Nov 2"..., 32768) = 604
close(7) = 0
munmap(0x2ab9e29000, 32768) = 0
fstat(6, {st_mode=S_IFREG|0644, st_size=604, ...}) = 0
mmap(NULL, 32768, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0x2ab9e29000
read(6, "#\n# Alias file created Wed Nov 2"..., 32768) = 604
read(6, "", 32768) = 0
close(6) = 0
munmap(0x2ab9e29000, 32768) = 0
stat("/share/dblibs/mpiblast/uniprot_sprot.fasta.000.pal", 0x7fbffea020) = -1
ENOENT (No such file or directory)
stat("/share/dblibs/mpiblast/uniprot_sprot.fasta.000.pin",
{st_mode=S_IFREG|0644, st_size=166504, ...}) = 0
stat("/share/dblibs/mpiblast/comindex.mm", 0x7fbffef130) = -1 ENOENT (No such
file or directory)
stat("/share/dblibs/mpiblast/uniprot_sprot.fasta.000.pin",
{st_mode=S_IFREG|0644, st_size=166504, ...}) = 0
open("/share/dblibs/mpiblast/uniprot_sprot.fasta.000.pin", O_RDONLY) = 6
mmap(NULL, 166504, PROT_READ, MAP_SHARED, 6, 0) = 0x2ab9e29000
close(6) = 0
stat("/share/dblibs/mpiblast/uniprot_sprot.fasta.000.pnd",
{st_mode=S_IFREG|0644, st_size=0, ...}) = 0
stat("/share/dblibs/mpiblast/uniprot_sprot.fasta.000.psd",
{st_mode=S_IFREG|0644, st_size=768708, ...}) = 0
stat("/share/dblibs/mpiblast/uniprot_sprot.fasta.000.psi",
{st_mode=S_IFREG|0644, st_size=17296, ...}) = 0
stat("/share/dblibs/mpiblast/uniprot_sprot.fasta.000.ppd", 0x7fbffef130) = -1
ENOENT (No such file or directory)
stat("/share/dblibs/mpiblast/uniprot_sprot.fasta.001.pal", 0x7fbffea020) = -1
ENOENT (No such file or directory)
stat("/share/dblibs/mpiblast/uniprot_sprot.fasta.001.pin",
{st_mode=S_IFREG|0644, st_size=166504, ...}) = 0
stat("/share/dblibs/mpiblast/comindex.mm", 0x7fbffef130) = -1 ENOENT (No such
file or directory)
stat("/share/dblibs/mpiblast/uniprot_sprot.fasta.001.pin",
{st_mode=S_IFREG|0644, st_size=166504, ...}) = 0
open("/share/dblibs/mpiblast/uniprot_sprot.fasta.001.pin", O_RDONLY) = 6
mmap(NULL, 166504, PROT_READ, MAP_SHARED, 6, 0) = 0x2ab9e52000
close(6) = 0
stat("/share/dblibs/mpiblast/uniprot_sprot.fasta.001.pnd",
{st_mode=S_IFREG|0644, st_size=0, ...}) = 0
stat("/share/dblibs/mpiblast/uniprot_sprot.fasta.001.psd",
{st_mode=S_IFREG|0644, st_size=768492, ...}) = 0
stat("/share/dblibs/mpiblast/uniprot_sprot.fasta.001.psi",
{st_mode=S_IFREG|0644, st_size=17274, ...}) = 0
stat("/share/dblibs/mpiblast/uniprot_sprot.fasta.001.ppd", 0x7fbffef130) = -1
ENOENT (No such file or directory)
stat("/share/dblibs/mpiblast/uniprot_sprot.fasta.002.pal", 0x7fbffea020) = -1
ENOENT (No such file or directory)
stat("/share/dblibs/mpiblast/uniprot_sprot.fasta.002.pin",
{st_mode=S_IFREG|0644, st_size=166504, ...}) = 0
stat("/share/dblibs/mpiblast/comindex.mm", 0x7fbffef130) = -1 ENOENT (No such
file or directory)
stat("/share/dblibs/mpiblast/uniprot_sprot.fasta.002.pin",
{st_mode=S_IFREG|0644, st_size=166504, ...}) = 0
open("/share/dblibs/mpiblast/uniprot_sprot.fasta.002.pin", O_RDONLY) = 6
mmap(NULL, 166504, PROT_READ, MAP_SHARED, 6, 0) = 0x2ab9e7b000
close(6) = 0
stat("/share/dblibs/mpiblast/uniprot_sprot.fasta.002.pnd",
{st_mode=S_IFREG|0644, st_size=0, ...}) = 0
stat("/share/dblibs/mpiblast/uniprot_sprot.fasta.002.psd",
{st_mode=S_IFREG|0644, st_size=768856, ...}) = 0
stat("/share/dblibs/mpiblast/uniprot_sprot.fasta.002.psi",
{st_mode=S_IFREG|0644, st_size=17246, ...}) = 0
stat("/share/dblibs/mpiblast/uniprot_sprot.fasta.002.ppd", 0x7fbffef130) = -1
ENOENT (No such file or directory)
stat("/share/dblibs/mpiblast/uniprot_sprot.fasta.003.pal", 0x7fbffea020) = -1
ENOENT (No such file or directory)
stat("/share/dblibs/mpiblast/uniprot_sprot.fasta.003.pin",
{st_mode=S_IFREG|0644, st_size=166504, ...}) = 0
stat("/share/dblibs/mpiblast/comindex.mm", 0x7fbffef130) = -1 ENOENT (No such
file or directory)
stat("/share/dblibs/mpiblast/uniprot_sprot.fasta.003.pin",
{st_mode=S_IFREG|0644, st_size=166504, ...}) = 0
open("/share/dblibs/mpiblast/uniprot_sprot.fasta.003.pin", O_RDONLY) = 6
mmap(NULL, 166504, PROT_READ, MAP_SHARED, 6, 0) = 0x2ab9ea4000
close(6) = 0
stat("/share/dblibs/mpiblast/uniprot_sprot.fasta.003.pnd",
{st_mode=S_IFREG|0644, st_size=0, ...}) = 0
stat("/share/dblibs/mpiblast/uniprot_sprot.fasta.003.psd",
{st_mode=S_IFREG|0644, st_size=768804, ...}) = 0
stat("/share/dblibs/mpiblast/uniprot_sprot.fasta.003.psi",
{st_mode=S_IFREG|0644, st_size=17257, ...}) = 0
stat("/share/dblibs/mpiblast/uniprot_sprot.fasta.003.ppd", 0x7fbffef130) = -1
ENOENT (No such file or directory)
stat("/share/dblibs/mpiblast/uniprot_sprot.fasta.004.pal", 0x7fbffea020) = -1
ENOENT (No such file or directory)
stat("/share/dblibs/mpiblast/uniprot_sprot.fasta.004.pin",
{st_mode=S_IFREG|0644, st_size=166512, ...}) = 0
stat("/share/dblibs/mpiblast/comindex.mm", 0x7fbffef130) = -1 ENOENT (No such
file or directory)
stat("/share/dblibs/mpiblast/uniprot_sprot.fasta.004.pin",
{st_mode=S_IFREG|0644, st_size=166512, ...}) = 0
open("/share/dblibs/mpiblast/uniprot_sprot.fasta.004.pin", O_RDONLY) = 6
mmap(NULL, 166512, PROT_READ, MAP_SHARED, 6, 0) = 0x2ab9ecd000
close(6) = 0
stat("/share/dblibs/mpiblast/uniprot_sprot.fasta.004.pnd",
{st_mode=S_IFREG|0644, st_size=0, ...}) = 0
stat("/share/dblibs/mpiblast/uniprot_sprot.fasta.004.psd",
{st_mode=S_IFREG|0644, st_size=769222, ...}) = 0
stat("/share/dblibs/mpiblast/uniprot_sprot.fasta.004.psi",
{st_mode=S_IFREG|0644, st_size=17320, ...}) = 0
stat("/share/dblibs/mpiblast/uniprot_sprot.fasta.004.ppd", 0x7fbffef130) = -1
ENOENT (No such file or directory)
stat("/share/dblibs/mpiblast/uniprot_sprot.fasta.005.pal", 0x7fbffea020) = -1
ENOENT (No such file or directory)
stat("/share/dblibs/mpiblast/uniprot_sprot.fasta.005.pin",
{st_mode=S_IFREG|0644, st_size=166512, ...}) = 0
stat("/share/dblibs/mpiblast/comindex.mm", 0x7fbffef130) = -1 ENOENT (No such
file or directory)
stat("/share/dblibs/mpiblast/uniprot_sprot.fasta.005.pin",
{st_mode=S_IFREG|0644, st_size=166512, ...}) = 0
open("/share/dblibs/mpiblast/uniprot_sprot.fasta.005.pin", O_RDONLY) = 6
mmap(NULL, 166512, PROT_READ, MAP_SHARED, 6, 0) = 0x2ab9ef6000
close(6) = 0
stat("/share/dblibs/mpiblast/uniprot_sprot.fasta.005.pnd",
{st_mode=S_IFREG|0644, st_size=0, ...}) = 0
stat("/share/dblibs/mpiblast/uniprot_sprot.fasta.005.psd",
{st_mode=S_IFREG|0644, st_size=768998, ...}) = 0
stat("/share/dblibs/mpiblast/uniprot_sprot.fasta.005.psi",
{st_mode=S_IFREG|0644, st_size=17265, ...}) = 0
stat("/share/dblibs/mpiblast/uniprot_sprot.fasta.005.ppd", 0x7fbffef130) = -1
ENOENT (No such file or directory)
stat("/share/dblibs/mpiblast/uniprot_sprot.fasta.006.pal", 0x7fbffea020) = -1
ENOENT (No such file or directory)
stat("/share/dblibs/mpiblast/uniprot_sprot.fasta.006.pin",
{st_mode=S_IFREG|0644, st_size=166512, ...}) = 0
stat("/share/dblibs/mpiblast/comindex.mm", 0x7fbffef130) = -1 ENOENT (No such
file or directory)
stat("/share/dblibs/mpiblast/uniprot_sprot.fasta.006.pin",
{st_mode=S_IFREG|0644, st_size=166512, ...}) = 0
open("/share/dblibs/mpiblast/uniprot_sprot.fasta.006.pin", O_RDONLY) = 6
mmap(NULL, 166512, PROT_READ, MAP_SHARED, 6, 0) = 0x2ab9f1f000
close(6) = 0
stat("/share/dblibs/mpiblast/uniprot_sprot.fasta.006.pnd",
{st_mode=S_IFREG|0644, st_size=0, ...}) = 0
stat("/share/dblibs/mpiblast/uniprot_sprot.fasta.006.psd",
{st_mode=S_IFREG|0644, st_size=768576, ...}) = 0
stat("/share/dblibs/mpiblast/uniprot_sprot.fasta.006.psi",
{st_mode=S_IFREG|0644, st_size=17254, ...}) = 0
stat("/share/dblibs/mpiblast/uniprot_sprot.fasta.006.ppd", 0x7fbffef130) = -1
ENOENT (No such file or directory)
stat("/share/dblibs/mpiblast/uniprot_sprot.fasta.007.pal", 0x7fbffea020) = -1
ENOENT (No such file or directory)
stat("/share/dblibs/mpiblast/uniprot_sprot.fasta.007.pin",
{st_mode=S_IFREG|0644, st_size=166512, ...}) = 0
stat("/share/dblibs/mpiblast/comindex.mm", 0x7fbffef130) = -1 ENOENT (No such
file or directory)
stat("/share/dblibs/mpiblast/uniprot_sprot.fasta.007.pin",
{st_mode=S_IFREG|0644, st_size=166512, ...}) = 0
open("/share/dblibs/mpiblast/uniprot_sprot.fasta.007.pin", O_RDONLY) = 6
mmap(NULL, 166512, PROT_READ, MAP_SHARED, 6, 0) = 0x2ab9f48000
close(6) = 0
stat("/share/dblibs/mpiblast/uniprot_sprot.fasta.007.pnd",
t/uniprot_sprot.fasta.pal", O_RDONLY <unfinished ...>
Here's the 'top' output while it's in this phase:
top - 10:59:05 up 13 days, 17:19, 2 users, load average: 15.39, 14.99, 14.91
Tasks: 106 total, 13 running, 93 sleeping, 0 stopped, 0 zombie
Cpu(s): 90.5% us, 8.8% sy, 0.0% ni, 0.0% id, 0.0% wa, 0.2% hi, 0.5% si
Mem: 2055592k total, 1841228k used, 214364k free, 36076k buffers
Swap: 4104596k total, 51876k used, 4052720k free, 560236k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1364 userx 25 0 662m 657m 5408 R 96.9 32.8 288:42.29 mpiblast
1366 userx 25 0 52764 50m 9164 R 17.0 2.5 32:03.48 mpiblast
1368 userx 25 0 52764 50m 9140 R 17.0 2.5 42:13.77 mpiblast
1365 userx 25 0 52764 50m 9272 R 16.3 2.5 31:41.58 mpiblast
1374 userx 25 0 52764 50m 9124 R 16.3 2.5 36:11.10 mpiblast
1376 userx 25 0 52764 50m 9180 R 16.0 2.5 40:15.22 mpiblast
1371 userx 25 0 52764 50m 9088 R 15.6 2.5 48:58.12 mpiblast
1367 userx 25 0 52764 50m 9248 R 0.7 2.5 18:32.87 mpiblast
1369 userx 25 0 52764 50m 9120 R 0.7 2.5 13:34.60 mpiblast
1373 userx 25 0 52764 50m 9104 R 0.7 2.5 27:16.60 mpiblast
6724 userx 16 0 5284 928 692 R 0.7 0.0 0:00.16 top
1370 userx 25 0 52764 50m 8972 R 0.3 2.5 7:01.37 mpiblast
1335 userx 17 0 53804 868 864 S 0.0 0.0 0:00.00 sh
1336 userx 17 0 53940 916 912 S 0.0 0.0 0:00.01 mpirun
1375 userx 25 0 52764 50m 9156 R 0.0 2.5 6:18.04 mpiblast
6363 userx 16 0 35768 2784 2012 S 0.0 0.1 0:00.29 sshd
6364 userx 15 0 54824 1668 928 S 0.0 0.1 0:00.07 tcsh
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Mpiblast-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mpiblast-users