Thanks for finding this, George!

On Aug 10, 2005, at 12:37 AM, George Bosilca wrote:

I run all the ex-Pallas test and the same error happens. We try to malloc 0 bytes and we hang somewhere. Let me explain what I found. First of all, most of the tests seems to work perfectly (at least with the PTL/BTL I was able to run: sm, tcp, mx). The deadlock as well as the memory allocation
problem happens in the reduce_scatter operation.

Problem 1: allocating 0 bytes
- it's not a datatype problem. The datatype return the correct extent,
true_extent, lb. The problem is that we miss one case in the collective
  communications. How about the case when the user do a reduce_scatter
with all the counts set to zero ? We check if they are greater than zero
  and it's the case. Then we add them together and as expected a sum of
  zero is zero. So in the coll_basic_reduce_scatter line 79 we will
  allocate zero bytes because the extent and the true_extent of the
MPI_FLOAT datatype are equal and (count - 1) is -1 !!! There is a simple
  fix for this problem, if count == 0 then free_buffer should be set to
NULL (as we don't send or receive anything in this buffer it will just
  work fine) at the PTL/PML level.
- the same error can happens on the reduce function if the count is zero.
  I will protect this function too.

Problem 2: hanging
- somehow a strange optimization get inside the scatterv function. In the case where the sender has to send zero bytes it completly skip the send operation. But the receiver still expect to get a message. Anyway, this optimization is not correct, all messages have to be send. I know that it can (slightly) increase the time for the collective but it give us a simple way of checking the correctness of the global communication (as
  the PML handle the truncate case). Patch is on the way.

Once these 2 problems corrected we pass all the Pallas MPI1 tests. I run the tests with the PML ob1, teg and uniq and the PTL/BTL sm, tcp, gm (PTL)
and mx(PTL) with 2 and 8 processes.

  george.

PS: the patches will be commited soon.

On Aug 9, 2005, at 1:53 PM, Galen Shipman wrote:

      Hi Sridhar,

I have committed changes that allow you to set the debg verbosity,

OMPI_MCA_btl_base_debug
0 - no debug output
1 - standard debug output
2 - very verbose debug output

Also we have run the Pallas tests and are not able to reproduce your failures. We do see a warning in the Reduce test but it does not hang and runs to completion. Attached is a simple ping pong program,
try running this and let us know the results.

Thanks,

Galen



<mpi-ping.c>

On Aug 9, 2005, at 8:15 AM, Sridhar Chirravuri wrote:


      The same kind of output while running Pallas "pingpong" test.

-Sridhar

-----Original Message-----
From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On
Behalf Of Sridhar Chirravuri
Sent: Tuesday, August 09, 2005 7:44 PM
To: Open MPI Developers
Subject: Re: [O-MPI devel] Fwd: Regarding MVAPI Component in Open MPI


I have run sendrecv function in Pallas but it failed to run it. Here is
the output

[root@micrompi-2 SRC_PMB]# mpirun -np 2 PMB-MPI1 sendrecv
Could not join a running, existing universe
Establishing a new one named: default-universe-5097
[0,1,1][btl_mvapi.c:130:mca_btl_mvapi_del_procs] Stub
[0,1,1][btl_mvapi.c:130:mca_btl_mvapi_del_procs] Stub


[0,1,0][btl_mvapi.c:130:mca_btl_mvapi_del_procs] Stub

[0,1,0][btl_mvapi.c:130:mca_btl_mvapi_del_procs] Stub

[0,1,0][btl_mvapi_endpoint.c:542:mca_btl_mvapi_endpoint_send] Connection
to endpoint closed ... connecting ...
[0,1,0][btl_mvapi_endpoint.c:318:mca_btl_mvapi_endpoint_start_connect]
Initialized High Priority QP num = 263177, Low Priority QP num = 263178,
LID = 785

[0,1,0][btl_mvapi_endpoint.c:190: mca_btl_mvapi_endpoint_send_connect_req
] Sending High Priority QP num = 263177, Low Priority QP num = 263178,
LID = 785[0,1,0][btl_mvapi_endpoint.c:542:mca_btl_mvapi_endpoint_send]
Connection to endpoint closed ... connecting ...
[0,1,0][btl_mvapi_endpoint.c:318:mca_btl_mvapi_endpoint_start_connect]
Initialized High Priority QP num = 263179, Low Priority QP num = 263180,
LID = 786

[0,1,0][btl_mvapi_endpoint.c:190: mca_btl_mvapi_endpoint_send_connect_req
] Sending High Priority QP num = 263179, Low Priority QP num = 263180,
LID = 786#---------------------------------------------------
#    PALLAS MPI Benchmark Suite V2.2, MPI-1 part
#---------------------------------------------------
# Date       : Tue Aug  9 07:11:25 2005
# Machine    : x86_64# System     : Linux
# Release    : 2.6.9-5.ELsmp
# Version    : #1 SMP Wed Jan 5 19:29:47 EST 2005

#
# Minimum message length in bytes:   0
# Maximum message length in bytes:   4194304
#
# MPI_Datatype                   :   MPI_BYTE
# MPI_Datatype for reductions    :   MPI_FLOAT
# MPI_Op                         :   MPI_SUM
#
#

# List of Benchmarks to run:

# Sendrecv
[0,1,1][btl_mvapi_endpoint.c:368: mca_btl_mvapi_endpoint_reply_start_conn
ect] Initialized High Priority QP num = 263177, Low Priority QP num =
263178,  LID = 777

[0,1,1][btl_mvapi_endpoint.c:266: mca_btl_mvapi_endpoint_set_remote_info] Received High Priority QP num = 263177, Low Priority QP num 263178,  LID
= 785

[0,1,1][btl_mvapi_endpoint.c:756:mca_btl_mvapi_endpoint_qp_init_query]
Modified to init..Qp
7080096[0,1,1][btl_mvapi_endpoint.c:791: mca_btl_mvapi_endpoint_qp_init_q
uery] Modified to RTR..Qp
7080096[0,1,1][btl_mvapi_endpoint.c:814: mca_btl_mvapi_endpoint_qp_init_q
uery] Modified to RTS..Qp 7080096

[0,1,1][btl_mvapi_endpoint.c:756:mca_btl_mvapi_endpoint_qp_init_query]
Modified to init..Qp 7240736
[0,1,1][btl_mvapi_endpoint.c:791:mca_btl_mvapi_endpoint_qp_init_query]
Modified to RTR..Qp
7240736[0,1,1][btl_mvapi_endpoint.c:814: mca_btl_mvapi_endpoint_qp_init_q
uery] Modified to RTS..Qp 7240736
[0,1,1][btl_mvapi_endpoint.c:190: mca_btl_mvapi_endpoint_send_connect_req
] Sending High Priority QP num = 263177, Low Priority QP num = 263178,
LID = 777
[0,1,0][btl_mvapi_endpoint.c:266: mca_btl_mvapi_endpoint_set_remote_info] Received High Priority QP num = 263177, Low Priority QP num 263178,  LID
= 777
[0,1,0][btl_mvapi_endpoint.c:756:mca_btl_mvapi_endpoint_qp_init_query]
Modified to init..Qp 7081440
[0,1,0][btl_mvapi_endpoint.c:791:mca_btl_mvapi_endpoint_qp_init_query]
Modified to RTR..Qp 7081440
[0,1,0][btl_mvapi_endpoint.c:814:mca_btl_mvapi_endpoint_qp_init_query]
Modified to RTS..Qp 7081440
[0,1,0][btl_mvapi_endpoint.c:756:mca_btl_mvapi_endpoint_qp_init_query]
Modified to init..Qp 7241888
[0,1,0][btl_mvapi_endpoint.c:791:mca_btl_mvapi_endpoint_qp_init_query]
Modified to RTR..Qp
7241888[0,1,0][btl_mvapi_endpoint.c:814: mca_btl_mvapi_endpoint_qp_init_q
uery] Modified to RTS..Qp 7241888
[0,1,1][btl_mvapi_component.c:523:mca_btl_mvapi_component_progress] Got
a recv completion


Thanks
-Sridhar




-----Original Message-----
From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On
Behalf Of Brian Barrett
Sent: Tuesday, August 09, 2005 7:35 PM
To: Open MPI Developers
Subject: Re: [O-MPI devel] Fwd: Regarding MVAPI Component in Open MPI

On Aug 9, 2005, at 8:48 AM, Sridhar Chirravuri wrote:


      Does r6774 has lot of changes that are related to 3rd generation
point-to-point? I am trying to run some benchmark tests (ex:
pallas) with Open MPI stack and just want to compare the
performance figures with MVAPICH 095 and MVAPICH 092.

In order to use 3rd generation p2p communication, I have added the
following line in the /openmpi/etc/openmpi-mca-params.conf

pml=ob1

I also exported (as double check) OMPI_MCA_pml=ob1.

Then, I have tried running on the same machine. My machine has got
2 processors.

Mpirun -np 2 ./PMB-MPI1

I still see the following lines

Request for 0 bytes (coll_basic_reduce_scatter.c, 79)
Request for 0 bytes (coll_basic_reduce.c, 193)
Request for 0 bytes (coll_basic_reduce_scatter.c, 79)
Request for 0 bytes (coll_basic_reduce.c, 193)


These errors are coming from the collective routines, not the PML/BTL
layers.  It looks like the reduction codes are trying to call malloc
(0), which doesn't work so well.  We'll take a look as soon as we
can.  In the mean time, can you just not run the tests that call the
reduction collectives?

Brian


-- 
   Brian Barrett
   Open MPI developer
   http://www.open-mpi.org/


_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel





"We must accept finite disappointment, but we must never lose infinite
hope."
                                  Martin Luther King


_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/


Reply via email to