I had similar situations a lot of times at different customers' locations.

It always turned out that when the developers at the customer's site telled me that they returned all the storage requested to the heap, in fact they didn't. There was always a small area left which was not returned or freed. This area was then
responsible for the growing heap on every call.

What I did to diagnose the problem has already been done here, as far as I can see. The most interesting tool is the alternate heap manager (CEL4MCHK, IIRC), which allows you to track all memory allocations and frees, and it even tells you the stack trace at the place
when the allocations have been done.

I then wrote a procedure (REXX, IIRC), which processed the output of CEL4MCHK, just to see if there is a certain pattern in the areas which remain allocated. For example: I do exactly 1000 requests and then I look for areas which remain allocated (of a certain size) and are present in the list 1000 times (or a multiple of 1000 times). Then I do the same 2000 times, and look, if I have the same area 2000 times, and so on. (There is no need, BTW, to run the tests until all the memory is used up ... if you do this, your traces will grow much too large).

Areas which remain constant if I change the number of calls are of no interest. But if I find areas, which change the number exactly with the number of calls, I know the place where the allocation without free has been done. And then I call the developer, who is responsible for the module
which does that allocation and ask him or her to fix it.

I've done this many times, and every time I found the module or function which was responsible for the storage leak within some hours. The languages used (which lead to the storage leaks) were C, C++ (often), and even PL/1 - this doesn't really matter, because it's all LE.

AFAIK, for Windows and Linux and similar platforms, there is a tool called ValGrind,
which does a similar analysis.

The REXX procedure was used primarily to sort and group the requests in the CEL4MCHK output by size and caller sequence etc ... so that the numbers (like 1000 and 2000 above)
can easily be recognized.

This said: there is of course a small chance that the problem is not inside the user's (or customer's) code, but instead inside some of the vendor's functions (in this case: IBM, for example the JSON processors mentioned). But IMO the probability is low ... although in my career there were some rare situations, where after 4 weeks of examination of error situations, it REALLY turned out that the error was in the IBM part ... and it took me some time to convince IBM. If you want, I can tell you more about this ... but offline.

But even if this was the case in your situation, the CEL4MCHK method IMO would detect it. Honestly: I believe, you will find out that the error is in the customer's code.

HTH, kind regards

Bernd


Am 04.01.2024 um 22:45 schrieb Eric Erickson:
We are in a bit of a quandary here with some memory issues surrounding our 
application. This is a multitasking LE C application running in 31 bit mode 
that utilizes the IBM JSON and EZNOSQL Services. Some of the attributes are:

•       z/OS V2.5 Operation System
•       POSIX(OFF) - all tasks/subtasks
•       Single address space (31 Bit Mode)
•       ATTACHX Multi-tasking model (no pthreads)
•       Execute as started task – Problem State – Key 4
•       Drop in/out of supervisor state as needed
•       3 EZNOSQL Databases are opened at application start and remain open 
until termination
•       Open EZNOSQL connections tokens are passed to the worker task(s) along 
with the unit of work to be processed

Our issue is that the total available heap grows until we end up exhausting all 
available memory and inevitable application failure, but the key here is that 
while the total heap grows with every unit of work processed by tasks, the in use 
amount only shows no or only a small (<128 bytes) increment between units of 
work. For example, here is a heap report (using LE __heaprpt function) example. So 
we are fairly confident that our application code is not leaking memory.

HeapReport: ZdpQuery @Start  - Total/In Use/Available:   1048576/    888160/    
 160416.
HeapReport: ZdpQuery @Enter  - Total/In Use/Available:   1048576/    888160/    
 160416.
HeapReport: ZdpQuery @Exit   - Total/In Use/Available:   1560856/    888192/    
 672664.
HeapReport: ZdpQuery @Enter  - Total/In Use/Available:   1560856/    888192/    
 672664.
HeapReport: ZdpQuery @Exit   - Total/In Use/Available:   2073088/    888224/    
1184864.
HeapReport: ZdpQuery @Enter  - Total/In Use/Available:   2073088/    888224/    
1184864.
HeapReport: ZdpQuery @Exit   - Total/In Use/Available:   2073088/    888224/    
1184864.
HeapReport: ZdpQuery @Enter  - Total/In Use/Available:   2073088/    888224/    
1184864.
HeapReport: ZdpQuery @Exit   - Total/In Use/Available:   2585376/    888256/    
1697120.
HeapReport: ZdpQuery @Enter  - Total/In Use/Available:   2585376/    888256/    
1697120.
HeapReport: ZdpQuery @Exit   - Total/In Use/Available:   2585376/    888256/    
1697120.
HeapReport: ZdpQuery @Enter  - Total/In Use/Available:   2585376/    888256/    
1697120.
HeapReport: ZdpQuery @Exit   - Total/In Use/Available:   2585376/    888256/    
1697120.
HeapReport: ZdpQuery @Finish - Total/In Use/Available:   2585376/    888256/    
1697120.

The @Start and @Finish lines show the heap report results just after the task 
is attached and before it terminates. Each of the @Enter/@Exit lines show the 
heap at the unit of work start and end processing, respectively.

We are at a loss to explain why the heap keeps growing. We would expect that the 
heap would grow to some high water mark and become stabilized, but the total size 
just keeps growing until the application fails due to out of memory condition, even 
though there is a significant amount of heap storage available. Our tasks are 
returning all the storage they directly allocate back to the heap, as indicated by 
in use at start & end. While there is a small increment in the in use number, 
we think that may just be LE overhead in managing the heap, but in any case is 
generally less than 128 bytes per iteration, and only appears then the total heap 
size increases. What makes this example even more interesting, is that we are 
processing the exact same request for each iteration.

We’ve turned on all the various LE memory analysis options (HEAPCHK, RPTSTG) 
and utilized the LE alternate heap manager to detect overlays, corruption, 
etc.. This pointed us to a couple of minor leaks we plugged but has not led us 
to an answer as to the growing heap. We make heavy use of the IBM JSON and 
EZNOSQL services during processing.

We are in search of any insight, recommendations as to how to proceed in 
diagnosis this issue.

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Reply via email to