I have seen the advice to avoid garbage collection in batch from IBMers
before. I don't understand it, and I am curious to know where it is
coming from. I doubt it is endorsed by the JVM developers. I suspect it
might just be that suddenly we can measure memory management overhead,
where it is more difficult in other languages.
Garbage collection is Java's way of returning unused memory for reuse.
You could reduce memory management overhead of a batch C++ program by
removing all delete statements, and increasing the virtual storage
available until it never ran out. You COULD, but no-one would recommend
it as good practice. Overallocating the heap to avoid garbage collection
is basically the same thing.
Applications tend to evolve and grow over time. If you deliberately set
up your application to avoid GC, you may be in for a rude shock when the
application grows and one day GC is triggered.
There can also be performance advantages from GC. GC moves objects
together in storage, making it much more likely that your application
data will be in the processor caches. If GC keeps your data in processor
cache it will perform much better than if it's scattered across a GB of
storage.
On the other points:
Stess testing - memory management is usually an important factor in
application performance, so I'm not sure how valid any stress test that
avoided garbage collection would be. (Processor cache effects etc. as
much as GC overhead).
Memory leaks - this applies to any language - if the memory isn't
released, of course you need enough virtual storage to support it
between restarts.
Page outs - paging Java out is very different to other applications due
to the GC memory access pattern. Yes, any inactive application will be
subject to page out (maybe? I have seen some information about page
fixed pages for Java - I don't know anything about it though). What you
don't want is portions of the heap paged out from an active application.
When Java performs a GC it is going to touch every page in the heap* -
so if you have 200MB paged out and an innocent 50 byte memory allocation
triggers GC it has to wait for 200MB of pages to be paged in one by one
before the allocation completes (assuming Java/zOS don't recognize and
optimize this page-in pattern). This is different to other languages
where pages will be paged in one at a time as required, and only if they
have active data.
I'm not saying to economize on real storage, on the contrary. The
original poster asked about testing Java applications with a shortage of
real storage - my response is that the performance will probably be
unacceptable and it's not worth testing - just make sure you DO have
enough real storage for the application.
On this side track of heap size and garbage collection my advice is:
1) Do not fear garbage collection. It is part of a normal Java
application. It does need to be carefully tuned for response time
sensitive applications, but for these applications any paging of the
Java heap will likely be disastrous.
2) Do not allocate a heap so large that you risk paging instead of GC.
Paging is far worse than GC for Java performance.
n.b. the definition of "a too large heap" is a moving target. I would
say it is enough storage inactive for long enough that parts of the heap
might be paged out. It would be unusual for a few hundred MB in a normal
batch job to be an issue.
Regards
Andrew Rowley
Black Hill Software
* My understanding of what happens. I'm happy to be corrected by someone
with more knowledge of GC strategies and internals.
On 6/08/2015 14:49, Timothy Sipples wrote:
I agree with Andrew Rowley's advice so long as it's properly understood to
be *general* advice -- "rules of thumb." There are some very interesting
exceptions. (Aren't there always? :-))
Regarding making the Java heap "too large," there are some use cases --
Java batch, notably -- where you really do want to make the heap "too
large," or at least slightly too large. If the JVM is transitory, and if
you can avoid any/all garbage collection during the transitory life of the
program, that might be a perfectly wonderful, optimal outcome. "It
depends." Another potential scenario is stress testing, perhaps during the
initial phases, when you're trying to understand the performance and
scalability characteristics of an application before allowing garbage
collection to "interfere" with your assessments. (Maybe you don't have the
best measurement tools?) Or you're simply trying to determine how much is
"too much," so you start with "too much" in your testing.
Maybe you have a defective application that's got a memory leak, and
garbage collection eventually cannot accomplish anything. The application
instance then abends. But to avoid restarting the application instance too
frequently you throw "too much" memory at the application instance(s) until
you and/or the vendor can fix the leak. (Been there.) (It depends on your
point of view what "too much" means in these cases. Theoretically such a
defective application requires an infinite amount of heap, so it can never
have "too much.")
There are situations when it can be perfectly reasonable to page out.
Examples: development and test environments, and cloned execution instances
when you don't need all the clones running but would like to have some
paged in as demand warrants. Basically anything/everything that is highly
transient, with temporary and occasional demand, but you want to avoid full
startup. It's really "thrashing" that you want to avoid. Though paging
might be necessary to produce thrashing, it's not sufficient.
All that said, I see way too many cases of operators/sysprogs/managers
perversely trying to economize on memory, some perhaps remembering the
"good old days" when "Hello World!" required only a few bytes. For better
or worse, that hasn't been true for at least a couple decades. Suck it up
and spend the memory. :-)
----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN