Hi Neil,

It appears to be a "regression" in scenario you described (user application never close the input stream after use and the ZipFile instance being retained during the lifetime of the
process). The proposed approach seems to solve this particular problem.

Here are my comments regarding the patch.

(1) ZipFileInflaterInputStream.close() now synchronizes on Zipfile.this, is it really necessary? given the rest of the implementation doesn't guarantee the returned InputStream object is thread-safe (the implementation does make sure the access to the underlying ZipFile is thread-safe, though). The new finalize() now also invokes close(), which means the ZipFile object now gets locked twice (assume the good manner application does
     invoke the close() after use) for each InputStream after use.

(2) Is there any particular reason that we don't want to "reuse" the Inflater(), if the close comes from the finalize() in your testing evn? One thing we noticed before is that the moment the InputStream gets finalized, the Inflater embedded might have been finalized already, this is the reason why we have a "inf.ended()" check in releaseInflater().

The disadvantage/limitation of existing Inflater caching mechanism is that it does not have a "cap" to limit the maximum number of inflater it should cache. Are you worried too many inflaters get cached, if they only get collected during GC/finalize()? This actually indicates that the cache mechanism does not work at all in your scenario, even with the proposed fix. An alternative solution (better?) is to make ZFIIS to check the bytes read during its read() method, and pro-actively close the stream when it reaches the end of the stream, same as what ZipFileInputStream does, which I believe should solve the problem in most use scenario (except, the application does not bother reading the
     InputSteeam to its end, which is unlikely).

(3) The attached test case does "f.close()" in its finally block, I guess it should be f.delete()?

-Sherman



On 03/23/2011 02:46 PM, Neil Richards wrote:
Hi,
I've noticed that the fix introduced to address bug 6735255 [1] [2] had
an unfortunate side-effect.

That fix stores references to the InputStream objects returned from
ZipFile.getInputStream(ZipEntry) in a HashSet (within ZipFile), so that
the close() method may be called upon those objects when ZipFile.close()
is called.

It does this to conform to the existing API description, and to avoid a
native memory leak which would occur if the InputStream is GC'd without
its close() being called.

However, by holding these InputStreams in a set within their ZipFile
object, their lifecycle is now prolonged until the ZipFile object either
has its close() method called, or is finalized (prior to GC).

I've found scenarios (in user code) were ZipFile objects are held onto
for a long time (eg. the entire lifetime of the process), but where
InputStream objects from that ZipFile (for individual entries in the zip
file) are obtained, used, then abandoned on a large number of occasions.

(An example here might be a user-defined, long-lasting class loader,
which might retain a ZipFile instance for each jar file in its search
path, but which will create transitory input streams to read out
particular files from those (large) jar files).

I see that the InputStream objects returned from
ZipFile.getInputStream(ZipEntry) tend to hold references to a couple of
sizeable byte arrays, one 512 bytes in size, the other 1002 bytes.

So by prolonging the life span of these objects, the fix for 6735255 can
inadvertently cause a significant increase in the demand/usage on the
heap (in practice, running into many Mb's).

(This is not so if the user code is good-mannered enough to explicitly
always call close() on the InputStreams in question.
However, experience shows that user code cannot be relied upon to behave
so benignly).

I believe this introduced problem can be addressed by:
      1. Holding references to the InputStreams in the 'streams' Set
         within ZipFile via WeakReferences, so they may be GC'd as soon
         as they are no longer externally (strongly) referenced.
      2. Adding finalization logic to the InputStream implementation
         classes, which ensures their close() method is called prior to
         GC.
      3. Prevent Inflater objects from being returned to the pool (in the
         ZipFile object) if close() has been called from finalize().

To that end, please find below an 'hg export' of a proposed change which
implements these aspects, together with a testcase to demonstrate the
problem/fix.

Any comments / suggestions on this gratefully received,
Thanks,
Neil

[1] http://bugs.sun.com/view_bug.do?bug_id=6735255
[2] http://hg.openjdk.java.net/jdk7/jdk7/jdk/rev/49478a651a28


Reply via email to