now seems to be fixed with r19538.
On 9/10/08, Ralph Castain <r...@lanl.gov> wrote: > > I'm sorry - I can't even make sense of this. If you think you can reproduce > it, then you are welcome to fix it. I cannot reproduce it, and hence can do > nothing further about it. > > Ralph > > > On Sep 10, 2008, at 2:01 AM, Lenny Verkhovsky wrote: > > Hi Ralph, >> >> I can recreate this failure, I think it caused by the fact that we do not >> open orted on the last node( also I didnt check it ), since np < number of >> hosts. >> >> I used the falowing configure line ../configure >> --prefix=/home/USERS/lenny/OMPI_ORTE_TRUNK >> >> on OMPI 1.4a1r19522 >> Hope it helped. >> >> #mpirun -np 3 -H witch2 ./spawn_multiple >> Parent: 1 of 3, witch2 (1 in init) >> Parent: 0 of 3, witch2 (1 in init) >> Parent: 2 of 3, witch2 (1 in init) >> #mpirun -np 3 -H witch2,witch3 ./spawn_multiple >> Parent: 0 of 3, witch2 (0 in init) >> Parent: 2 of 3, witch2 (0 in init) >> Parent: 1 of 3, witch3 (0 in init) >> #mpirun -np 3 -H witch2,witch3,witch4 ./spawn_multiple >> Parent: 0 of 3, witch2 (0 in init) >> Parent: 1 of 3, witch3 (0 in init) >> Parent: 2 of 3, witch4 (0 in init) >> #mpirun -np 3 -H witch2,witch3,witch4,witch5 ./spawn_multiple >> Parent: 0 of 3, witch2 (0 in init) >> Parent: 1 of 3, witch3 (0 in init) >> Parent: 2 of 3, witch4 (0 in init) >> [witch1:04806] *** Process received signal *** >> [witch1:04806] Signal: Segmentation fault (11) >> [witch1:04806] Signal code: Address not mapped (1) >> [witch1:04806] Failing at address: 0x38 >> [witch1:04806] [ 0] /lib64/libpthread.so.0 [0x2af5324e9c10] >> [witch1:04806] [ 1] >> /home/USERS/lenny/OMPI_ORTE_TRUNK/lib/libopen-rte.so.0(orte_plm_base_app_report_launch+0x27a) >> [0x2af531de3dca] >> [witch1:04806] [ 2] /home/USERS/lenny/OMPI_ORTE_TRUNK/lib/libopen-pal.so.0 >> [0x2af531f161bb] >> [witch1:04806] [ 3] /home/USERS/lenny/OMPI_ORTE_TRUNK/bin/mpirun >> [0x40378f] >> [witch1:04806] [ 4] /home/USERS/lenny/OMPI_ORTE_TRUNK/lib/libopen-pal.so.0 >> [0x2af531f161bb] >> [witch1:04806] [ 5] >> /home/USERS/lenny/OMPI_ORTE_TRUNK/lib/libopen-pal.so.0(opal_progress+0x9e) >> [0x2af531f0bf5e] >> [witch1:04806] [ 6] >> /home/USERS/lenny/OMPI_ORTE_TRUNK/lib/libopen-rte.so.0(orte_trigger_event+0x44) >> [0x2af531dc6c84] >> [witch1:04806] [ 7] >> /home/USERS/lenny/OMPI_ORTE_TRUNK/lib/libopen-rte.so.0(orte_plm_base_app_report_launch+0x20b) >> [0x2af531de3d5b] >> [witch1:04806] [ 8] /home/USERS/lenny/OMPI_ORTE_TRUNK/lib/libopen-pal.so.0 >> [0x2af531f161bb] >> [witch1:04806] [ 9] >> /home/USERS/lenny/OMPI_ORTE_TRUNK/lib/libopen-pal.so.0(opal_progress+0x9e) >> [0x2af531f0bf5e] >> [witch1:04806] [10] >> /home/USERS/lenny/OMPI_ORTE_TRUNK/lib/libopen-rte.so.0(orte_plm_base_launch_apps+0x227) >> [0x2af531de47e7] >> [witch1:04806] [11] >> /home/USERS/lenny/OMPI_ORTE_TRUNK/lib/openmpi/mca_plm_rsh.so >> [0x2af532c38d3d] >> [witch1:04806] [12] >> /home/USERS/lenny/OMPI_ORTE_TRUNK/lib/libopen-rte.so.0(orte_plm_base_receive_process_msg+0x456) >> [0x2af531de3086] >> [witch1:04806] [13] /home/USERS/lenny/OMPI_ORTE_TRUNK/lib/libopen-pal.so.0 >> [0x2af531f161bb] >> [witch1:04806] [14] /home/USERS/lenny/OMPI_ORTE_TRUNK/bin/mpirun >> [0x4033bc] >> [witch1:04806] [15] /home/USERS/lenny/OMPI_ORTE_TRUNK/bin/mpirun >> [0x402c23] >> [witch1:04806] [16] /lib64/libc.so.6(__libc_start_main+0xf4) >> [0x2af532610154] >> [witch1:04806] [17] /home/USERS/lenny/OMPI_ORTE_TRUNK/bin/mpirun >> [0x402b79] >> [witch1:04806] *** End of error message *** >> Segmentation fault >> >> Lenny. >> >> >