Well, havent tryed 1.7.2 yet, but too elaborate the problem a little bit more,

the groth happens if we use an MPI_ALLREDUCE in a recursive subroutine call, that means in FORTRAN90 speech the subroutine calls itself again, and is specially marked in order to work properly. Apart from that nothing is special with this routine. Is it possible that the F77 interface in Openmpi is not able to cope with recursions ?

MAX



Am 13.09.13 17:18, schrieb Rolf vandeVaart:
Yes, it appears the send_requests list is the one that is growing.  This list 
holds the send request structures that are in use.  After a send is completed, 
a send request is supposed to be returned to this list and then get re-used.

With 7 processes, it had reached a size of 16,324 send requests in use.  With 
the 8 processes, it had reached a size of 16,708.  Each send request is 720 
bytes (in debug build it is 872) and if we do the math we have consumed about 
12 Mbytes.

Setting some type of bound will not fix this issue.  There is something else 
going on here that is causing this problem.   I know you described the problem 
earlier on, but maybe you can explain again?  How many processes?  What type of 
cluster?    One other thought is perhaps trying Open MPI 1.7.2 to see if you 
still see the problem.   Maybe someone else has suggestions too.

Rolf

PS: For those who missed a private email, I had Max add some instrumentation so 
we could see which list was growing.  We now know it is the 
mca_pml_base_send_requests list.

-----Original Message-----
From: Max Staufer [mailto:max.stau...@gmx.net]
Sent: Friday, September 13, 2013 7:06 AM
To: Rolf vandeVaart;de...@open-mpi.org
Subject: Re: [OMPI devel] Nearly unlimited growth of pml free list

Hi Rolf,

    I applied your patch, the full output is rather big, even gzip > 10Mb, 
which is
not good for the mailinglist, but the head and tail are below for a 7 and 8
processor run.
Seem that the send requests are growing fast 4000 times in just 10 min.

Do you now of a method to bound the list such that it is not growing excessivly
?

thanks

Max

7 Processor run
------------------
[gpu207.dev-env.lan:11236] Iteration = 0 sleeping [gpu207.dev-env.lan:11236]
Freelist=rdma_frags, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11236]
Freelist=recv_frags, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11236]
Freelist=pending_pckts, numAlloc=4, maxAlloc=-1 [gpu207.dev-
env.lan:11236] Freelist=send_ranges_pckts, numAlloc=4,
maxAlloc=-1
[gpu207.dev-env.lan:11236] Freelist=send_requests, numAlloc=4, maxAlloc=-
1 [gpu207.dev-env.lan:11236] Freelist=recv_requests, numAlloc=4,
maxAlloc=-1 [gpu207.dev-env.lan:11236] rdma_pending=0, pckt_pending=0,
recv_pending=0, send_pending=0, comm_pending=0 [gpu207.dev-
env.lan:11236] [gpu207.dev-env.lan:11236] Iteration = 0 sleeping
[gpu207.dev-env.lan:11236] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11236] Freelist=recv_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11236] Freelist=pending_pckts, numAlloc=4, maxAlloc=-
1 [gpu207.dev-env.lan:11236] Freelist=send_ranges_pckts, numAlloc=4,
maxAlloc=-1
[gpu207.dev-env.lan:11236] Freelist=send_requests, numAlloc=4, maxAlloc=-
1 [gpu207.dev-env.lan:11236] Freelist=recv_requests, numAlloc=4,
maxAlloc=-1 [gpu207.dev-env.lan:11236] rdma_pending=0, pckt_pending=0,
recv_pending=0, send_pending=0, comm_pending=0 [gpu207.dev-
env.lan:11236] [gpu207.dev-env.lan:11236] Iteration = 0 sleeping
[gpu207.dev-env.lan:11236] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11236] Freelist=recv_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11236] Freelist=pending_pckts, numAlloc=4, maxAlloc=-
1 [gpu207.dev-env.lan:11236] Freelist=send_ranges_pckts, numAlloc=4,
maxAlloc=-1
[gpu207.dev-env.lan:11236] Freelist=send_requests, numAlloc=4, maxAlloc=-
1 [gpu207.dev-env.lan:11236] Freelist=recv_requests, numAlloc=4,
maxAlloc=-1 [gpu207.dev-env.lan:11236] rdma_pending=0, pckt_pending=0,
recv_pending=0, send_pending=0, comm_pending=0 [gpu207.dev-
env.lan:11236] [gpu207.dev-env.lan:11236] Iteration = 0 sleeping
[gpu207.dev-env.lan:11236] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11236] Freelist=recv_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11236] Freelist=pending_pckts, numAlloc=4, maxAlloc=-
1 [gpu207.dev-env.lan:11236] Freelist=send_ranges_pckts, numAlloc=4,
maxAlloc=-1
[gpu207.dev-env.lan:11236] Freelist=send_requests, numAlloc=4, maxAlloc=-
1 [gpu207.dev-env.lan:11236] Freelist=recv_requests, numAlloc=4,
maxAlloc=-1 [gpu207.dev-env.lan:11236] rdma_pending=0, pckt_pending=0,
recv_pending=0, send_pending=0, comm_pending=0 [gpu207.dev-
env.lan:11236] [gpu207.dev-env.lan:11236] Iteration = 0 sleeping
[gpu207.dev-env.lan:11236] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11236] Freelist=recv_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11236] Freelist=pending_pckts, numAlloc=4, maxAlloc=-
1 [gpu207.dev-env.lan:11236] Freelist=send_ranges_pckts, numAlloc=4,
maxAlloc=-1
[gpu207.dev-env.lan:11236] Freelist=send_requests, numAlloc=4, maxAlloc=-
1 [gpu207.dev-env.lan:11236] Freelist=recv_requests, numAlloc=4,
maxAlloc=-1 [gpu207.dev-env.lan:11236] rdma_pending=0, pckt_pending=0,
recv_pending=0, send_pending=0, comm_pending=0


......

[gpu207.dev-env.lan:11243] Freelist=send_ranges_pckts, numAlloc=4,
maxAlloc=-1
[gpu207.dev-env.lan:11243] Freelist=send_requests, numAlloc=16324,
maxAlloc=-1
[gpu207.dev-env.lan:11243] Freelist=recv_requests, numAlloc=68, maxAlloc=-
1 [gpu207.dev-env.lan:11243] rdma_pending=0, pckt_pending=0,
recv_pending=0, send_pending=0, comm_pending=0 [gpu207.dev-
env.lan:11243] [gpu207.dev-env.lan:11243] Iteration = 0 sleeping
[gpu207.dev-env.lan:11243] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11243] Freelist=recv_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11243] Freelist=pending_pckts, numAlloc=4, maxAlloc=-
1 [gpu207.dev-env.lan:11243] Freelist=send_ranges_pckts, numAlloc=4,
maxAlloc=-1
[gpu207.dev-env.lan:11243] Freelist=send_requests, numAlloc=16324,
maxAlloc=-1
[gpu207.dev-env.lan:11243] Freelist=recv_requests, numAlloc=68, maxAlloc=-
1 [gpu207.dev-env.lan:11243] rdma_pending=0, pckt_pending=0,
recv_pending=0, send_pending=0, comm_pending=0 [gpu207.dev-
env.lan:11243] [gpu207.dev-env.lan:11243] Iteration = 0 sleeping
[gpu207.dev-env.lan:11243] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11243] Freelist=recv_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11243] Freelist=pending_pckts, numAlloc=4, maxAlloc=-
1 [gpu207.dev-env.lan:11243] Freelist=send_ranges_pckts, numAlloc=4,
maxAlloc=-1
[gpu207.dev-env.lan:11243] Freelist=send_requests, numAlloc=16324,
maxAlloc=-1
[gpu207.dev-env.lan:11243] Freelist=recv_requests, numAlloc=68, maxAlloc=-
1 [gpu207.dev-env.lan:11243] rdma_pending=0, pckt_pending=0,
recv_pending=0, send_pending=0, comm_pending=0 [gpu207.dev-
env.lan:11243] [gpu207.dev-env.lan:11243] Iteration = 0 sleeping
[gpu207.dev-env.lan:11243] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11243] Freelist=recv_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11243] Freelist=pending_pckts, numAlloc=4, maxAlloc=-
1 [gpu207.dev-env.lan:11243] Freelist=send_ranges_pckts, numAlloc=4,
maxAlloc=-1
[gpu207.dev-env.lan:11243] Freelist=send_requests, numAlloc=16324,
maxAlloc=-1
[gpu207.dev-env.lan:11243] Freelist=recv_requests, numAlloc=68, maxAlloc=-
1 [gpu207.dev-env.lan:11243] rdma_pending=0, pckt_pending=0,
recv_pending=0, send_pending=0, comm_pending=0 [gpu207.dev-
env.lan:11243] [gpu207.dev-env.lan:11243] Iteration = 0 sleeping
[gpu207.dev-env.lan:11243] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11243] Freelist=recv_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11243] Freelist=pending_pckts, numAlloc=4, maxAlloc=-
1 [gpu207.dev-env.lan:11243] Freelist=send_ranges_pckts, numAlloc=4,
maxAlloc=-1
[gpu207.dev-env.lan:11243] Freelist=send_requests, numAlloc=16324,
maxAlloc=-1
[gpu207.dev-env.lan:11243] Freelist=recv_requests, numAlloc=68, maxAlloc=-
1 [gpu207.dev-env.lan:11243] rdma_pending=0, pckt_pending=0,
recv_pending=0, send_pending=0, comm_pending=0 [gpu207.dev-
env.lan:11243] [gpu207.dev-env.lan:11243] Iteration = 0 sleeping
[gpu207.dev-env.lan:11243] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11243] Freelist=recv_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11243] Freelist=pending_pckts, numAlloc=4, maxAlloc=-
1 [gpu207.dev-env.lan:11243] Freelist=send_ranges_pckts, numAlloc=4,
maxAlloc=-1
[gpu207.dev-env.lan:11243] Freelist=send_requests, numAlloc=16324,
maxAlloc=-1
[gpu207.dev-env.lan:11243] Freelist=recv_requests, numAlloc=68, maxAlloc=-
1 [gpu207.dev-env.lan:11243] rdma_pending=0, pckt_pending=0,
recv_pending=0, send_pending=0, comm_pending=0


8 Processor run
--------------------

[gpu207.dev-env.lan:11315] Iteration = 0 sleeping [gpu207.dev-env.lan:11315]
Freelist=rdma_frags, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11315]
Freelist=recv_frags, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11315]
Freelist=pending_pckts, numAlloc=4, maxAlloc=-1 [gpu207.dev-
env.lan:11315] Freelist=send_ranges_pckts, numAlloc=4,
maxAlloc=-1
[gpu207.dev-env.lan:11315] Freelist=send_requests, numAlloc=4, maxAlloc=-
1 [gpu207.dev-env.lan:11315] Freelist=recv_requests, numAlloc=4,
maxAlloc=-1 [gpu207.dev-env.lan:11315] rdma_pending=0, pckt_pending=0,
recv_pending=0, send_pending=0, comm_pending=0 [gpu207.dev-
env.lan:11315] [gpu207.dev-env.lan:11315] Iteration = 0 sleeping
[gpu207.dev-env.lan:11315] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11315] Freelist=recv_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11315] Freelist=pending_pckts, numAlloc=4, maxAlloc=-
1 [gpu207.dev-env.lan:11315] Freelist=send_ranges_pckts, numAlloc=4,
maxAlloc=-1
[gpu207.dev-env.lan:11315] Freelist=send_requests, numAlloc=4, maxAlloc=-
1 [gpu207.dev-env.lan:11315] Freelist=recv_requests, numAlloc=4,
maxAlloc=-1 [gpu207.dev-env.lan:11315] rdma_pending=0, pckt_pending=0,
recv_pending=0, send_pending=0, comm_pending=0 [gpu207.dev-
env.lan:11315] [gpu207.dev-env.lan:11315] Iteration = 0 sleeping
[gpu207.dev-env.lan:11315] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11315] Freelist=recv_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11315] Freelist=pending_pckts, numAlloc=4, maxAlloc=-
1 [gpu207.dev-env.lan:11315] Freelist=send_ranges_pckts, numAlloc=4,
maxAlloc=-1
[gpu207.dev-env.lan:11315] Freelist=send_requests, numAlloc=4, maxAlloc=-
1 [gpu207.dev-env.lan:11315] Freelist=recv_requests, numAlloc=4,
maxAlloc=-1 [gpu207.dev-env.lan:11315] rdma_pending=0, pckt_pending=0,
recv_pending=0, send_pending=0, comm_pending=0 [gpu207.dev-
env.lan:11315] [gpu207.dev-env.lan:11315] Iteration = 0 sleeping
[gpu207.dev-env.lan:11315] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11315] Freelist=recv_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11315] Freelist=pending_pckts, numAlloc=4, maxAlloc=-
1 [gpu207.dev-env.lan:11315] Freelist=send_ranges_pckts, numAlloc=4,
maxAlloc=-1
[gpu207.dev-env.lan:11315] Freelist=send_requests, numAlloc=4, maxAlloc=-
1 [gpu207.dev-env.lan:11315] Freelist=recv_requests, numAlloc=4,
maxAlloc=-1 [gpu207.dev-env.lan:11315] rdma_pending=0, pckt_pending=0,
recv_pending=0, send_pending=0, comm_pending=0 [gpu207.dev-
env.lan:11315] [gpu207.dev-env.lan:11315] Iteration = 0 sleeping
[gpu207.dev-env.lan:11315] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11315] Freelist=recv_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11315] Freelist=pending_pckts, numAlloc=4, maxAlloc=-
1 [gpu207.dev-env.lan:11315] Freelist=send_ranges_pckts, numAlloc=4,
maxAlloc=-1
[gpu207.dev-env.lan:11315] Freelist=send_requests, numAlloc=4, maxAlloc=-
1 [gpu207.dev-env.lan:11315] Freelist=recv_requests, numAlloc=4,
maxAlloc=-1 [gpu207.dev-env.lan:11315] rdma_pending=0, pckt_pending=0,
recv_pending=0, send_pending=0, comm_pending=0


...

[gpu207.dev-env.lan:11322] Iteration = 0 sleeping [gpu207.dev-env.lan:11322]
Freelist=rdma_frags, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11322]
Freelist=recv_frags, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11322]
Freelist=pending_pckts, numAlloc=4, maxAlloc=-1 [gpu207.dev-
env.lan:11322] Freelist=send_ranges_pckts, numAlloc=4,
maxAlloc=-1
[gpu207.dev-env.lan:11322] Freelist=send_requests, numAlloc=16708,
maxAlloc=-1
[gpu207.dev-env.lan:11322] Freelist=recv_requests, numAlloc=68, maxAlloc=-
1 [gpu207.dev-env.lan:11322] rdma_pending=0, pckt_pending=0,
recv_pending=0, send_pending=0, comm_pending=0 [gpu207.dev-
env.lan:11322] [gpu207.dev-env.lan:11322] Iteration = 0 sleeping
[gpu207.dev-env.lan:11322] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11322] Freelist=recv_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11322] Freelist=pending_pckts, numAlloc=4, maxAlloc=-
1 [gpu207.dev-env.lan:11322] Freelist=send_ranges_pckts, numAlloc=4,
maxAlloc=-1
[gpu207.dev-env.lan:11322] Freelist=send_requests, numAlloc=16708,
maxAlloc=-1
[gpu207.dev-env.lan:11322] Freelist=recv_requests, numAlloc=68, maxAlloc=-
1 [gpu207.dev-env.lan:11322] rdma_pending=0, pckt_pending=0,
recv_pending=0, send_pending=0, comm_pending=0 [gpu207.dev-
env.lan:11322] [gpu207.dev-env.lan:11322] Iteration = 0 sleeping
[gpu207.dev-env.lan:11322] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11322] Freelist=recv_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11322] Freelist=pending_pckts, numAlloc=4, maxAlloc=-
1 [gpu207.dev-env.lan:11322] Freelist=send_ranges_pckts, numAlloc=4,
maxAlloc=-1
[gpu207.dev-env.lan:11322] Freelist=send_requests, numAlloc=16708,
maxAlloc=-1
[gpu207.dev-env.lan:11322] Freelist=recv_requests, numAlloc=68, maxAlloc=-
1 [gpu207.dev-env.lan:11322] rdma_pending=0, pckt_pending=0,
recv_pending=0, send_pending=0, comm_pending=0 [gpu207.dev-
env.lan:11322] [gpu207.dev-env.lan:11322] Iteration = 0 sleeping
[gpu207.dev-env.lan:11322] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11322] Freelist=recv_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11322] Freelist=pending_pckts, numAlloc=4, maxAlloc=-
1 [gpu207.dev-env.lan:11322] Freelist=send_ranges_pckts, numAlloc=4,
maxAlloc=-1
[gpu207.dev-env.lan:11322] Freelist=send_requests, numAlloc=16708,
maxAlloc=-1
[gpu207.dev-env.lan:11322] Freelist=recv_requests, numAlloc=68, maxAlloc=-
1 [gpu207.dev-env.lan:11322] rdma_pending=0, pckt_pending=0,
recv_pending=0, send_pending=0, comm_pending=0 [gpu207.dev-
env.lan:11322] [gpu207.dev-env.lan:11322] Iteration = 0 sleeping
[gpu207.dev-env.lan:11322] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11322] Freelist=recv_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11322] Freelist=pending_pckts, numAlloc=4, maxAlloc=-
1 [gpu207.dev-env.lan:11322] Freelist=send_ranges_pckts, numAlloc=4,
maxAlloc=-1
[gpu207.dev-env.lan:11322] Freelist=send_requests, numAlloc=16708,
maxAlloc=-1
[gpu207.dev-env.lan:11322] Freelist=recv_requests, numAlloc=68, maxAlloc=-
1 [gpu207.dev-env.lan:11322] rdma_pending=0, pckt_pending=0,
recv_pending=0, send_pending=0, comm_pending=0



Am 12.09.2013 17:04, schrieb Rolf vandeVaart:
Can you apply this patch and try again?  It will print out the sizes of the free
lists after every 100 calls into the mca_pml_ob1_send.  It would be interesting
to see which one is growing.
This might give us some clues.

Rolf

-----Original Message-----
From: Max Staufer [mailto:max.stau...@gmx.net]
Sent: Thursday, September 12, 2013 3:53 AM
To: Rolf vandeVaart
Subject: Re: [OMPI devel] Nearly unlimited growth of pml free list

Hi Rolf,

     the heap snapshots I do tell me where and when the memory has
been allocated, and a simple source trace of the in tells me that the
calling

routine was mca_pml_ob1_send  and that all of the ~100000 single
allocations during the run were called because of an MPI_ALLREDUCE
command called in exactly one place of the code.
The tool I use for doing that is MemorySCAPE but I thing Valgrind can
tell you the same thing. However, I was not able to reproduce the
problem in a simpler program yet, but I suspect it has something to
do with the locking mechanism of the list elements. I dont know
enough about OMPI to comment on that, but it looks like that the list
is growing because all elements are locked.

really any help is appreciated

Max

PS:

IF I MIMICK ALLREDUCE with 2*Nproc SEND and RECV commands
(aggregating on proc 0 and then sending out to all Proc) I get the same kind
of behaviour.
Am 11.09.2013 17:12, schrieb Rolf vandeVaart:
Hi Max:
You say that that the function keeps "allocating memory in the pml free
list."
How do you know that is happening?
Do you know which free list it is happening on?  There are something
like 8
free lists associated with the pml ob1 so it would be interesting to
know which one you observe is growing.
Rolf

-----Original Message-----
From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Max
Staufer
Sent: Wednesday, September 11, 2013 10:23 AM
To:de...@open-mpi.org
Subject: [OMPI devel] Nearly unlimited growth of pml free list

Hi All,

       as I already asked in the users list, I was told thats not
the right place to ask, I came across a "missbehaviour" of openmpi
version
1.4.5 and 1.6.5 alike.
the mca_pml_ob1_send function keeps allocating memory in the pml
free
list.
It does that indefinitly. In my case the list grew to about 100Gb.

I can controll the maximum using the pml_ob1_free_list_max
parameter, but then the application just stops working when this
number of entries in the list is reached.

The interesting part is that the growth only happens in a single
place in the code, which is RECURSIVE SUBROUTINE.

And the called function is an MPI_ALLREDUCE(... MPI_SUM)

Apparently its not easy to create a test program that shows the
same behaviour, just recursion is not enought.

Is there a mca parameter that allows to limit the total list size
without making the app. stop ?

or is there a way to enforce the lock on the free list entries ?

Thanks for all the help

Max
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
--------------------------------------------------------------------
--
------------- This email message is for the sole use of the intended
recipient(s) and may contain confidential information.  Any
unauthorized review, use, disclosure or distribution is prohibited.
If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
--------------------------------------------------------------------
--
-------------

Reply via email to