Unfortunately I send the mail partially already - but incomplete - sorry
for posting again ...
-------------
Hallo everybody,
... I try to run mpiblast on a 17 node (2proc each) oscar 4.2 cluster
based on RHEL 4. The master node is a dual Xeon2.4 while the 16 workers
are dual Pentium III (933).
All cluster tests described in the oscar installation were successful
and also I was able to compile the NCBI toolbox (patch included) without
problems. Mpiblast was compiled using the following options:
--with-ncbi=/usr/local/ncbi and --with-mpi=/opt/lam-7.0.6
This was successful as well. Blasting very small contigs (8kb) against
the uniprot database works perfect and fast. When I try to go with
querys over 30kb in size the following error occur after roughly 5 min:
mpirun -np 34 /usr/local/bin/mpiblast --debug=/database/tmpfiledir/debug
-p blastx -d uniprot -i /database/tmpfiledir/contig76_53738-84646.tmp -o
/database/results/contigs_masked.out
*** glibc detected *** free(): invalid pointer: 0x08a34870 ***
-----------------------------------------------------------------------------
One of the processes started by mpirun has exited with a nonzero exit
code. This typically indicates that the process finished in error.
If your process did not finish in error, be sure to include a "return
0" or "exit(0)" in your C code before exiting the application.
PID 20219 failed on node n7 (10.0.0.8) due to signal 6.
-----------------------------------------------------------------------------
2 134.052 Bailing out with signal -1
3 134.054 Bailing out with signal -1
4 134.055 Bailing out with signal -1
5 134.057 Bailing out with signal -1
67 134.059 Bailing out with signal -1
134.06 Bailing out with signal -1
89 134.063 Bailing out with signal -1
134.063 Bailing out with signal -1
1011 134.066 Bailing out with signal -1
134.066 Bailing out with signal -1
12 134.069 Bailing out with signal -1
15 134.071 Bailing out with signal -1
13 134.071 Bailing out with signal -1
17 134.076 Bailing out with signal -1
16 134.076 Bailing out with signal -1
18 134.077 Bailing out with signal -1
19 134.078 Bailing out with signal -1
20 134.08 Bailing out with signal -1
21 134.081 Bailing out with signal -1
22 23 134.084 Bailing out with signal -1
134.084 Bailing out with signal -124 134.086 Bailing out with signal -1
25 134.088 Bailing out with signal
-1
27 134.09 Bailing out with signal -1
26 134.091 Bailing out with signal -1
28 134.09329 134.094 Bailing out with signal -1
Bailing out with signal -1
30 134.096 Bailing out with signal -1
32 134.099 Bailing out with signal -1
34 134.1 Bailing out with signal -1
33 134.101 Bailing out with signal -1
31 134.097 Bailing out with signal -1
35 134.1 Bailing out with signal -1
... there is no further bad message when I switch on debug logs - it
seems to break suddenly and the address of the "glibc free()" message
sometimes shows different numbers on different nodes ...
I tried the "-ssi rpi lamd" option of mpirun as well since there was
something in the manual in connection with lamd - same result ....
Next - I compiled both - the 2004 and the 2005 version of the ncbi
toolbox with a similar result. Also compiling against mpich or lam has
no influence. As well I tried to compile the 1.3.0 release of mpiblast
but without success.
Are there any things which I can do to get more debug output? Do you
have a recommendation about a kernel or a specific RHEL version (or
maybe a specific cluster software) where it runs since I do not believe
that it relies on hardware.
any help would be appreciated
best regards
Mike
--
+--------------------------------------------------------------------+
| Mike Schilling |
| MWG Biotech AG voice : int+49 8092 8289303 |
| Anzinger Strasse 7, fax : int+49 8092 8289561 |
| D-85560 Ebersberg, Germany email : [EMAIL PROTECTED] |
| web : http://www.mwgdna.com |
+--------------------------------------------------------------------+
| Contrary to popular belief, UNIX is user friendly. It just happens |
| to be very selective about who it decides to make friends with. |
+--------------------------------------------------------------------+
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Mpiblast-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/mpiblast-users