Hi, I got the latest code drop of 6791 today morning.
I have removed .ompi_ignore and .ompi_unignore files from ompi/mca/mpool/mvapi directory. If I don't remove and build, the MPI program fails with signal 11. After removing those hidden files from that directory and building, signal 11 error disappeared. I have configured with the options given by Galen. ./configure --prefix=/openmpi --with-btl-mvapi=/usr/local/topspin/ --enable-mca-no-build=btl-openib,pml-teg,pml-uniq After make all install, I have run pallas but I got the same error messages (please see down below for error messages). I have run 3-4 times, sometimes I didn't get any output but pallas just hungs. I have run pingpong only. I have run pallas (all functions including reduce), but got the following messages in intra-node case. Request for 0 bytes (coll_basic_reduce_scatter.c, 79) Request for 0 bytes (coll_basic_reduce.c, 193) Request for 0 bytes (coll_basic_reduce_scatter.c, 79) Request for 0 bytes (coll_basic_reduce.c, 193) Since these types of messages seen by George, upcoming patch might resolve this issue. Also, I have run mpi-ping.c program given by Galen with the latest code drop and it just hung. Here is the output [root@micrompi-1 ~]# mpirun -np 2 ./a.out -r 10 0 100000 1000 Could not join a running, existing universe Establishing a new one named: default-universe-12461 mpi-ping: ping-pong nprocs=2, reps=10, min bytes=0, max bytes=100000 inc bytes=1000 0 pings 1 ... I just did ctrl+c here after 10 mins ... 2 processes killed (possibly by Open MPI) I have no clue whether the George patch will fix this problem or not. Before running mpi-ping program, I have export OMPI_MCA_btl_base_debug=2 in my shell. Thanks -Sridhar -----Original Message----- From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On Behalf Of Galen Shipman Sent: Tuesday, August 09, 2005 11:10 PM To: Open MPI Developers Subject: Re: [O-MPI devel] Fwd: Regarding MVAPI Component in Open MPI Hi On Aug 9, 2005, at 8:15 AM, Sridhar Chirravuri wrote: > The same kind of output while running Pallas "pingpong" test. > > -Sridhar > > -----Original Message----- > From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On > Behalf Of Sridhar Chirravuri > Sent: Tuesday, August 09, 2005 7:44 PM > To: Open MPI Developers > Subject: Re: [O-MPI devel] Fwd: Regarding MVAPI Component in Open MPI > > > I have run sendrecv function in Pallas but it failed to run it. Here is > the output > > [root@micrompi-2 SRC_PMB]# mpirun -np 2 PMB-MPI1 sendrecv > Could not join a running, existing universe > Establishing a new one named: default-universe-5097 > [0,1,1][btl_mvapi.c:130:mca_btl_mvapi_del_procs] Stub > [0,1,1][btl_mvapi.c:130:mca_btl_mvapi_del_procs] Stub > > > [0,1,0][btl_mvapi.c:130:mca_btl_mvapi_del_procs] Stub > > [0,1,0][btl_mvapi.c:130:mca_btl_mvapi_del_procs] Stub > > [0,1,0][btl_mvapi_endpoint.c:542:mca_btl_mvapi_endpoint_send] > Connection > to endpoint closed ... connecting ... > [0,1,0][btl_mvapi_endpoint.c:318:mca_btl_mvapi_endpoint_start_connect] > Initialized High Priority QP num = 263177, Low Priority QP num = > 263178, > LID = 785 > > [0,1,0][btl_mvapi_endpoint.c:190: > mca_btl_mvapi_endpoint_send_connect_req > ] Sending High Priority QP num = 263177, Low Priority QP num = 263178, > LID = 785[0,1,0][btl_mvapi_endpoint.c:542:mca_btl_mvapi_endpoint_send] > Connection to endpoint closed ... connecting ... > [0,1,0][btl_mvapi_endpoint.c:318:mca_btl_mvapi_endpoint_start_connect] > Initialized High Priority QP num = 263179, Low Priority QP num = > 263180, > LID = 786 > > [0,1,0][btl_mvapi_endpoint.c:190: > mca_btl_mvapi_endpoint_send_connect_req > ] Sending High Priority QP num = 263179, Low Priority QP num = 263180, > LID = 786#--------------------------------------------------- > # PALLAS MPI Benchmark Suite V2.2, MPI-1 part > #--------------------------------------------------- > # Date : Tue Aug 9 07:11:25 2005 > # Machine : x86_64# System : Linux > # Release : 2.6.9-5.ELsmp > # Version : #1 SMP Wed Jan 5 19:29:47 EST 2005 > > # > # Minimum message length in bytes: 0 > # Maximum message length in bytes: 4194304 > # > # MPI_Datatype : MPI_BYTE > # MPI_Datatype for reductions : MPI_FLOAT > # MPI_Op : MPI_SUM > # > # > > # List of Benchmarks to run: > > # Sendrecv > [0,1,1][btl_mvapi_endpoint.c:368: > mca_btl_mvapi_endpoint_reply_start_conn > ect] Initialized High Priority QP num = 263177, Low Priority QP num = > 263178, LID = 777 > > [0,1,1][btl_mvapi_endpoint.c:266: > mca_btl_mvapi_endpoint_set_remote_info] > Received High Priority QP num = 263177, Low Priority QP num 263178, > LID > = 785 > > [0,1,1][btl_mvapi_endpoint.c:756:mca_btl_mvapi_endpoint_qp_init_query] > Modified to init..Qp > 7080096[0,1,1][btl_mvapi_endpoint.c:791: > mca_btl_mvapi_endpoint_qp_init_q > uery] Modified to RTR..Qp > 7080096[0,1,1][btl_mvapi_endpoint.c:814: > mca_btl_mvapi_endpoint_qp_init_q > uery] Modified to RTS..Qp 7080096 > > [0,1,1][btl_mvapi_endpoint.c:756:mca_btl_mvapi_endpoint_qp_init_query] > Modified to init..Qp 7240736 > [0,1,1][btl_mvapi_endpoint.c:791:mca_btl_mvapi_endpoint_qp_init_query] > Modified to RTR..Qp > 7240736[0,1,1][btl_mvapi_endpoint.c:814: > mca_btl_mvapi_endpoint_qp_init_q > uery] Modified to RTS..Qp 7240736 > [0,1,1][btl_mvapi_endpoint.c:190: > mca_btl_mvapi_endpoint_send_connect_req > ] Sending High Priority QP num = 263177, Low Priority QP num = 263178, > LID = 777 > [0,1,0][btl_mvapi_endpoint.c:266: > mca_btl_mvapi_endpoint_set_remote_info] > Received High Priority QP num = 263177, Low Priority QP num 263178, > LID > = 777 > [0,1,0][btl_mvapi_endpoint.c:756:mca_btl_mvapi_endpoint_qp_init_query] > Modified to init..Qp 7081440 > [0,1,0][btl_mvapi_endpoint.c:791:mca_btl_mvapi_endpoint_qp_init_query] > Modified to RTR..Qp 7081440 > [0,1,0][btl_mvapi_endpoint.c:814:mca_btl_mvapi_endpoint_qp_init_query] > Modified to RTS..Qp 7081440 > [0,1,0][btl_mvapi_endpoint.c:756:mca_btl_mvapi_endpoint_qp_init_query] > Modified to init..Qp 7241888 > [0,1,0][btl_mvapi_endpoint.c:791:mca_btl_mvapi_endpoint_qp_init_query] > Modified to RTR..Qp > 7241888[0,1,0][btl_mvapi_endpoint.c:814: > mca_btl_mvapi_endpoint_qp_init_q > uery] Modified to RTS..Qp 7241888 > [0,1,1][btl_mvapi_component.c:523:mca_btl_mvapi_component_progress] Got > a recv completion > > > Thanks > -Sridhar > > > > > -----Original Message----- > From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On > Behalf Of Brian Barrett > Sent: Tuesday, August 09, 2005 7:35 PM > To: Open MPI Developers > Subject: Re: [O-MPI devel] Fwd: Regarding MVAPI Component in Open MPI > > On Aug 9, 2005, at 8:48 AM, Sridhar Chirravuri wrote: > >> Does r6774 has lot of changes that are related to 3rd generation >> point-to-point? I am trying to run some benchmark tests (ex: >> pallas) with Open MPI stack and just want to compare the >> performance figures with MVAPICH 095 and MVAPICH 092. >> >> In order to use 3rd generation p2p communication, I have added the >> following line in the /openmpi/etc/openmpi-mca-params.conf >> >> pml=ob1 >> >> I also exported (as double check) OMPI_MCA_pml=ob1. >> >> Then, I have tried running on the same machine. My machine has got >> 2 processors. >> >> Mpirun -np 2 ./PMB-MPI1 >> >> I still see the following lines >> >> Request for 0 bytes (coll_basic_reduce_scatter.c, 79) >> Request for 0 bytes (coll_basic_reduce.c, 193) >> Request for 0 bytes (coll_basic_reduce_scatter.c, 79) >> Request for 0 bytes (coll_basic_reduce.c, 193) > > These errors are coming from the collective routines, not the PML/BTL > layers. It looks like the reduction codes are trying to call malloc > (0), which doesn't work so well. We'll take a look as soon as we > can. In the mean time, can you just not run the tests that call the > reduction collectives? > > Brian > > > -- > Brian Barrett > Open MPI developer > http://www.open-mpi.org/ > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel _______________________________________________ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel