Pawel (privately) wrote:
Hi,

I know that it sounds unbeliveable for most of you, but I would like to 
share with problems that we started to face on our test servers about 
2-3 days ago.

We started to run out of memory errors on our test servers, eg.:
jsh t24fe ~ -->jdiag
 ** Warning [ PERFORM_ERROR ] **
Unix error number 0 while attempting PERFORM , Line   111 , Source jsh.b
Trap from an error message, error message name = PERFORM_ERROR
Line 111 , Source jsh.b
jBASE debugger->q
Are you sure ?y
  
Pawel,

Have you set the environment variable for AIX that causes memory allocation to be from the top of the HEAP down? It changes the allocation algorithm for AIX and is a huge improvement over the standard one. In fact, I suspect that as the engineers produced this new algorithm for allocation patterns like your batch job, that the standard one no longer tries to deal with it, though the literature does not indicate this:

Improvements to malloc subsystem
The number of malloc-related environment variables supported by AIX 5L Version 5.3 has been reduced to three and the attributes they can assume have been redefined. These environment variables are: MALLOCTYPE, MALLOCOPTIONS, and MALLOCDEBUG. MALLOCOPTIONS is a new environment variable that has been added to take care of all current and future options to the MALLOCTYPE allocators. It supplants the MALLOCBUCKETS, MALLOCMULTIHEAP, and MALLOCDISCLAIM that have been deprecated.

The three environment variables have the following definitions and some of the attributes they can assume:

MALLOCTYPE Used to specify allocator type. Attributes include: 3.1, default allocator, watson, and user.
MALLOCOPTIONS Used to specify allocator options. Attributes include: buckets, disclaim, threadcache, and multiheap.
MALLOCDEBUG Used to specify debugging options. Attributes include: catch_overflow, report_allocations, Malloc Log, Malloc Trace, and validate_ptrs.

The following enhancements have also been incorporated: ... etc.

I think that you want:

export MALLOCTYPE=watson

in your .profile

However there are other options to tune that, including a malloc cache and so on, so read up on it. It is safe to go right ahead and try that simple export, but don't give up on it if it seems not to improve things right away.




.

We have several instances of some banking product on each test server. 
We run batch processing (COB) on these test servers - usually 2-3 envs 
in the same time on one, single AIX "test" server.
Our specificity is that we run these COBs using multiple agents - say 20 
processes, giving 60 agents in total (60 jBASE processes running in the 
same time on one test machine).

Today I was first time informed about problem and we have checked memory 
allocation. It showed that we simply run out of it (physical and swap 
runs out).
  
Also check your local settings here. A lot of malloc algorithms pre-allocate swap space in case they NEED to swap. Hence you can run out of swap even though you have not actually used any. For this, you just allocate a huge amount of swap (disk is cheap) or find the tuning options to tell the system to run in optimistic mode for swap, which means it doesn't pre-allocate the swap and you are on your own if you run out of it.
We have also found out that some of jBASE (COB) processes are consuming 
large amounts of memory, for example:
  16    eoyrx08   61482  633 (521) 1025  26K 9.38M  395K 6541 1212M   35m  
2 SLEEP tSA 8 (BATCH.JOB.CONTROL,593)
  26    t24ferx  503830  786 (776)  228 266K 12.2M 1.33M 1814 1321M   34m  
2 SLEEP tSA 7 (BATCH.JOB.CONTROL,323)
  28    t24ferx  430118  808 (799)  200 240K 17.6M 1.28M 1813 1200M   45m  
2 tSA 6 (BATCH.JOB.CONTROL,322)
  
You have to be a bit careful here as you need to distinguish between memory that is shared by all processes (for mmap() of files and libraries) and real data memory consumed by your application. top seems to be reasonably good at distinguishing though.

However, what usually happens here is poor application programming. You can of course only inspect your local code but you should look for subroutines that do things like logging but never free up the variables they are using to accumulate log records, things like that.

I know that somebody can start to suggest me that our local C/C++ code 
is causing memory leaks. Please belive me that we do not run any C/C++ 
code during batch processing. We have only 1 (MQ) library written in C 
and interfaced (DEFC) to jBASE. It was done by external vendor and is 
LIVE since 5 years. It was thoroughly tested few years ago against memory leaks. You have to belive me, but this library does not run 
during COB. It is used only by some online processes.
  
As the only person to have written an MQ interface was me, I have supreme confidence in your MQ links ;-)
Therefore I claim that something must be wrong with jBASE. 
I see how you got here, but I would be extremely surprised :-) In 19 years, this has rarely been the case, though Greg and I had to fly all over the world to show that it wasn't more times than I can remember ;-). That doesn't mean it isn't a fault there, but it almost always isn't.
My guess is 
that jBASE does not free "transaction buffer" (does not downsize it once 
transaction is finished).
  
Nope, the system wouldn't run for more than 5 minutes if that were the case.
There are some (single threaded) jobs during our COB that create huge 
transactions (eg. 900K changes in one transaction). It seems to me that 
"changes" buffer is never downsized or this memory simply "leaks" 
somehow.
  
Well, the best answer to that is of course to fix the things, as it is a stupid design [well, it isn't a design ;-)], but they are probably not your programs. However, while the internal buffer might grow, even if it were never shrunk, it would be reused, not lost and re-allocated and would soon reach the maximum that your application needed and stay there. Hence, if this were the issue it would be because your application was just growing the transaction forever.

However, I suspect that the buffer allocation for such large transactions might be the root of your problem and that if you change the malloc algorithm to Watson, it will have a much easier time of it.
Does anyone face(d) similar problems?

jBASE version 4.1.5.17 (Major 4.1 , Minor 5.17 , Patch 5690 (Change 
52756)), AIX 5.3.0.0-06.
  
Yes - it is usually the allocation algorithm you are using and rogue applications :-)

Jim


--~--~---------~--~----~------------~-------~--~----~
Please read the posting guidelines at: http://groups.google.com/group/jBASE/web/Posting%20Guidelines

IMPORTANT: Type T24: at the start of the subject line for questions specific to Globus/T24

To post, send email to [email protected]
To unsubscribe, send email to [email protected]
For more options, visit this group at http://groups.google.com/group/jBASE?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to