Hi Mike,
What compiler are you using? What ./configure command-line options did
you use? Did you run ./configure with --enable-MPI_Alloc_mem? I
personally develop and test on SuSE, rocks 3.3 (which is a RHEL
derivitive), windows, and occasionally OS X.
To further track this problem down it may be necessary to compile with
debug options and run the program with a debugger attached. If you can
send me the query data set (off the list) I can try to reproduce the
problem.
-Aaron
Mike Schilling wrote:
Unfortunately I send the mail partially already - but incomplete -
sorry for posting again ...
-------------
Hallo everybody,
... I try to run mpiblast on a 17 node (2proc each) oscar 4.2 cluster
based on RHEL 4. The master node is a dual Xeon2.4 while the 16
workers are dual Pentium III (933).
All cluster tests described in the oscar installation were successful
and also I was able to compile the NCBI toolbox (patch included)
without problems. Mpiblast was compiled using the following options:
--with-ncbi=/usr/local/ncbi and --with-mpi=/opt/lam-7.0.6
This was successful as well. Blasting very small contigs (8kb) against
the uniprot database works perfect and fast. When I try to go with
querys over 30kb in size the following error occur after roughly 5 min:
mpirun -np 34 /usr/local/bin/mpiblast
--debug=/database/tmpfiledir/debug -p blastx -d uniprot -i
/database/tmpfiledir/contig76_53738-84646.tmp -o
/database/results/contigs_masked.out
*** glibc detected *** free(): invalid pointer: 0x08a34870 ***
-----------------------------------------------------------------------------
One of the processes started by mpirun has exited with a nonzero exit
code. This typically indicates that the process finished in error.
If your process did not finish in error, be sure to include a "return
0" or "exit(0)" in your C code before exiting the application.
PID 20219 failed on node n7 (10.0.0.8) due to signal 6.
-----------------------------------------------------------------------------
2 134.052 Bailing out with signal -1
3 134.054 Bailing out with signal -1
4 134.055 Bailing out with signal -1
5 134.057 Bailing out with signal -1
67 134.059 Bailing out with signal -1
134.06 Bailing out with signal -1
89 134.063 Bailing out with signal -1
134.063 Bailing out with signal -1
1011 134.066 Bailing out with signal -1
134.066 Bailing out with signal -1
12 134.069 Bailing out with signal -1
15 134.071 Bailing out with signal -1
13 134.071 Bailing out with signal -1
17 134.076 Bailing out with signal -1
16 134.076 Bailing out with signal -1
18 134.077 Bailing out with signal -1
19 134.078 Bailing out with signal -1
20 134.08 Bailing out with signal -1
21 134.081 Bailing out with signal -1
22 23 134.084 Bailing out with signal -1
134.084 Bailing out with signal -124 134.086 Bailing out with
signal -1
25 134.088 Bailing out with signal
-1
27 134.09 Bailing out with signal -1
26 134.091 Bailing out with signal -1
28 134.09329 134.094 Bailing out with signal -1
Bailing out with signal -1
30 134.096 Bailing out with signal -1
32 134.099 Bailing out with signal -1
34 134.1 Bailing out with signal -1
33 134.101 Bailing out with signal -1
31 134.097 Bailing out with signal -1
35 134.1 Bailing out with signal -1
... there is no further bad message when I switch on debug logs - it
seems to break suddenly and the address of the "glibc free()" message
sometimes shows different numbers on different nodes ...
I tried the "-ssi rpi lamd" option of mpirun as well since there was
something in the manual in connection with lamd - same result ....
Next - I compiled both - the 2004 and the 2005 version of the ncbi
toolbox with a similar result. Also compiling against mpich or lam has
no influence. As well I tried to compile the 1.3.0 release of
mpiblast but without success.
Are there any things which I can do to get more debug output? Do you
have a recommendation about a kernel or a specific RHEL version (or
maybe a specific cluster software) where it runs since I do not
believe that it relies on hardware.
any help would be appreciated
best regards
Mike
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Mpiblast-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mpiblast-users