I had similar situations a lot of times at different customers' locations.
It always turned out that when the developers at the customer's site
telled me
that they returned all the storage requested to the heap, in fact they
didn't.
There was always a small area left which was not returned or freed. This
area was then
responsible for the growing heap on every call.
What I did to diagnose the problem has already been done here, as far as
I can see.
The most interesting tool is the alternate heap manager (CEL4MCHK,
IIRC), which allows
you to track all memory allocations and frees, and it even tells you the
stack trace at the place
when the allocations have been done.
I then wrote a procedure (REXX, IIRC), which processed the output of
CEL4MCHK, just to see
if there is a certain pattern in the areas which remain allocated. For
example: I do exactly
1000 requests and then I look for areas which remain allocated (of a
certain size) and
are present in the list 1000 times (or a multiple of 1000 times). Then I
do the same 2000 times,
and look, if I have the same area 2000 times, and so on. (There is no
need, BTW, to run
the tests until all the memory is used up ... if you do this, your
traces will grow much too large).
Areas which remain constant if I change the number of calls are of no
interest. But if I find areas,
which change the number exactly with the number of calls, I know the
place where the allocation
without free has been done. And then I call the developer, who is
responsible for the module
which does that allocation and ask him or her to fix it.
I've done this many times, and every time I found the module or function
which was responsible
for the storage leak within some hours. The languages used (which lead
to the storage leaks)
were C, C++ (often), and even PL/1 - this doesn't really matter, because
it's all LE.
AFAIK, for Windows and Linux and similar platforms, there is a tool
called ValGrind,
which does a similar analysis.
The REXX procedure was used primarily to sort and group the requests in
the CEL4MCHK
output by size and caller sequence etc ... so that the numbers (like
1000 and 2000 above)
can easily be recognized.
This said: there is of course a small chance that the problem is not
inside the user's (or
customer's) code, but instead inside some of the vendor's functions (in
this case: IBM,
for example the JSON processors mentioned). But IMO the probability is
low ... although
in my career there were some rare situations, where after 4 weeks of
examination of
error situations, it REALLY turned out that the error was in the IBM
part ... and it took
me some time to convince IBM. If you want, I can tell you more about
this ... but offline.
But even if this was the case in your situation, the CEL4MCHK method IMO
would detect it.
Honestly: I believe, you will find out that the error is in the
customer's code.
HTH, kind regards
Bernd
Am 04.01.2024 um 22:45 schrieb Eric Erickson:
We are in a bit of a quandary here with some memory issues surrounding our
application. This is a multitasking LE C application running in 31 bit mode
that utilizes the IBM JSON and EZNOSQL Services. Some of the attributes are:
• z/OS V2.5 Operation System
• POSIX(OFF) - all tasks/subtasks
• Single address space (31 Bit Mode)
• ATTACHX Multi-tasking model (no pthreads)
• Execute as started task – Problem State – Key 4
• Drop in/out of supervisor state as needed
• 3 EZNOSQL Databases are opened at application start and remain open
until termination
• Open EZNOSQL connections tokens are passed to the worker task(s) along
with the unit of work to be processed
Our issue is that the total available heap grows until we end up exhausting all
available memory and inevitable application failure, but the key here is that
while the total heap grows with every unit of work processed by tasks, the in use
amount only shows no or only a small (<128 bytes) increment between units of
work. For example, here is a heap report (using LE __heaprpt function) example. So
we are fairly confident that our application code is not leaking memory.
HeapReport: ZdpQuery @Start - Total/In Use/Available: 1048576/ 888160/
160416.
HeapReport: ZdpQuery @Enter - Total/In Use/Available: 1048576/ 888160/
160416.
HeapReport: ZdpQuery @Exit - Total/In Use/Available: 1560856/ 888192/
672664.
HeapReport: ZdpQuery @Enter - Total/In Use/Available: 1560856/ 888192/
672664.
HeapReport: ZdpQuery @Exit - Total/In Use/Available: 2073088/ 888224/
1184864.
HeapReport: ZdpQuery @Enter - Total/In Use/Available: 2073088/ 888224/
1184864.
HeapReport: ZdpQuery @Exit - Total/In Use/Available: 2073088/ 888224/
1184864.
HeapReport: ZdpQuery @Enter - Total/In Use/Available: 2073088/ 888224/
1184864.
HeapReport: ZdpQuery @Exit - Total/In Use/Available: 2585376/ 888256/
1697120.
HeapReport: ZdpQuery @Enter - Total/In Use/Available: 2585376/ 888256/
1697120.
HeapReport: ZdpQuery @Exit - Total/In Use/Available: 2585376/ 888256/
1697120.
HeapReport: ZdpQuery @Enter - Total/In Use/Available: 2585376/ 888256/
1697120.
HeapReport: ZdpQuery @Exit - Total/In Use/Available: 2585376/ 888256/
1697120.
HeapReport: ZdpQuery @Finish - Total/In Use/Available: 2585376/ 888256/
1697120.
The @Start and @Finish lines show the heap report results just after the task
is attached and before it terminates. Each of the @Enter/@Exit lines show the
heap at the unit of work start and end processing, respectively.
We are at a loss to explain why the heap keeps growing. We would expect that the
heap would grow to some high water mark and become stabilized, but the total size
just keeps growing until the application fails due to out of memory condition, even
though there is a significant amount of heap storage available. Our tasks are
returning all the storage they directly allocate back to the heap, as indicated by
in use at start & end. While there is a small increment in the in use number,
we think that may just be LE overhead in managing the heap, but in any case is
generally less than 128 bytes per iteration, and only appears then the total heap
size increases. What makes this example even more interesting, is that we are
processing the exact same request for each iteration.
We’ve turned on all the various LE memory analysis options (HEAPCHK, RPTSTG)
and utilized the LE alternate heap manager to detect overlays, corruption,
etc.. This pointed us to a couple of minor leaks we plugged but has not led us
to an answer as to the growing heap. We make heavy use of the IBM JSON and
EZNOSQL services during processing.
We are in search of any insight, recommendations as to how to proceed in
diagnosis this issue.
----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN
----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN