Okay,
this was a build system issue in change CFLAGS... worked that out,
and then found the real problem... using size_t with mca params
instead of an int... Fix coming shortly..
On Jun 29, 2006, at 2:47 PM, Galen M. Shipman wrote:
More info:
Two cores are generated
mpirun:
(gdb) bt
#0 0x00000400002555d8 in .__pthread_alt_unlock () from /lib64/
libpthread.so.0
#1 0x0000040000251b10 in .__GI___pthread_mutex_unlock ()
from /lib64/libpthread.so.0
#2 0x00000400001108e0 in .poll_dispatch ()
from /home/ompi/local/lib/libopal.so.0
#3 0x000004000010ea48 in opal_event_loop (flags=1) at event.c:485
#4 0x0000040000104078 in opal_progress () at runtime/opal_progress.c:
259
#5 0x0000000010003a08 in opal_condition_wait (c=0x1001acf0,
m=0x1001aca0)
at ../../../opal/threads/condition.h:81
#6 0x0000000010003474 in orterun (argc=7, argv=0xfffffea5948) at
orterun.c:415
#7 0x0000000010002c50 in main (argc=7, argv=0xfffffea5948) at
main.c:13
#8 0x0000040000336dc8 in .__libc_start_main () from /lib64/libc.so.6
#9 0x0000000000000000 in ?? ()
orted:
#0 0x00000400002555d8 in .__pthread_alt_unlock () from /lib64/
libpthread.so.0
#1 0x0000040000251b10 in .__GI___pthread_mutex_unlock ()
from /lib64/libpthread.so.0
#2 0x00000400001108e0 in .poll_dispatch ()
from /home/ompi/local/lib/libopal.so.0
#3 0x000004000010ea48 in opal_event_loop (flags=1) at event.c:485
#4 0x0000040000104078 in opal_progress () at runtime/opal_progress.c:
259
#5 0x000004000051a35c in mca_oob_tcp_msg_wait (msg=0x10022d68,
rc=0xfffffe24d60) at oob_tcp_msg.c:106
#6 0x000004000052497c in mca_oob_tcp_send (name=0x1004a230,
iov=0xfffffe24e50, count=1, tag=2, flags=0) at oob_tcp_send.c:158
#7 0x0000040000095e40 in mca_oob_send_packed (peer=0x1004a230,
buffer=0x1003cf50, tag=2, flags=0) at base/oob_base_send.c:78
#8 0x0000040000560b50 in orte_gpr_proxy_subscribe (num_subs=1,
subscriptions=0xfffffe25030, num_trigs=1, trigs=0xfffffe25090)
at gpr_proxy_subscribe.c:121
#9 0x000004000007a6ec in orte_gpr_base_subscribe_1 (id=0xfffffe251a0,
trig_name=0x1003ce80 "orte-stage1-0",
sub_name=0x1003cd20 "ompi-oob-sub-0", action=39 '\'',
addr_mode=514,
segment=0x1003cea0 "orte-job-0", tokens=0x0, key=0x400005260a8
"oob-tcp",
cbfunc=0x400005392d0 <mca_oob_tcp_registry_callback>,
user_tag=0x0)
at base/gpr_base_simplified_subscribe.c:92
#10 0x0000040000517a7c in mca_oob_tcp_init () at oob_tcp.c:816
#11 0x0000040000095110 in mca_oob_base_module_init ()
at base/oob_base_init.c:263
#12 0x000004000005ef18 in orte_init_stage2 () at runtime/
orte_init_stage2.c:48
#13 0x0000040000062fe8 in orte_system_init (infrastructure=true)
at runtime/orte_system_init.c:46
#14 0x000004000005ce50 in orte_init (infrastructure=true)
at runtime/orte_init.c:48
#15 0x0000000010001ebc in main (argc=19, argv=0xfffffe267d8) at
orted.c:282
#16 0x0000040000336dc8 in .__libc_start_main () from /lib64/libc.so.6
#17 0x0000000000000000 in ?? ()
On Jun 29, 2006, at 2:33 PM, Galen M. Shipman wrote:
Hey Owen,
Taking this on list..
If I run on n249 orte just hangs waiting for completion of the send.
If I run on n248 I get:
[ompi@node-192-168-111-248 ~]$ mpirun -np 1 -mca btl self,openib ./
ring
Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR)
Failing at addr:0x10
[0] func:/home/ompi/local/lib/libopal.so.0 [0x4000012d6c0]
[1] func:/lib64/libpthread.so.0 [0x40000257270]
[2] func:[0x100428]
[3] func:/home/ompi/local/lib/libopal.so.0 [0x40000158310]
[4] func:/lib64/libpthread.so.0 [0x40000251b10]
[5] func:/home/ompi/local/lib/libopal.so.0 [0x400001108e0]
[6] func:/home/ompi/local/lib/libopal.so.0 [0x4000010ea48]
[7] func:/home/ompi/local/lib/libopal.so.0 [0x40000104078]
[8] func:mpirun [0x10003a08]
[9] func:mpirun [0x10003474]
[10] func:mpirun [0x10002c50]
[11] func:/lib64/libc.so.6 [0x40000336dc8]
*** End of error message ***
Segmentation fault
In order to debug can I get an xterm with proper x forwarding on this
machine?
- Galen
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel