Can I please get a review of this change which proposes to fix an issue 
`java.util.zip.ZipFile` which would cause failures when multiple instances of 
`ZipFile` using non-UTF8 `Charset` were operating against the same underlying 
ZIP file? This addresses https://bugs.openjdk.org/browse/JDK-8347712.

ZIP file specification allows for ZIP entries to mark a `UTF-8` flag to 
indicate that the entry name and comment are encoded using UTF8. A 
`java.util.zip.ZipFile` can be constructed by passing it a `Charset`. This 
`Charset` (which defaults to UTF-8) gets used for decoding entry names and 
comments for non-UTF8 entries.

The internal implementation of `ZipFile` uses a `ZipCoder` (backed by 
`java.nio.charset.CharsetEncoder/CharsetDecoder` instance) for the given 
`Charset`. Except for UTF8 `ZipCoder`, other `ZipCoder`s are not thread safe.

The internal implementation of `ZipFile` maintains a cache of `ZipFile$Source`. 
A `Source` corresponds to the underlying ZIP file and during construction, uses 
a `ZipCoder` for parsing the ZIP entries and once constructed holds on to the 
parsed ZIP structure. Multiple instances of a `ZipFile` which all correspond to 
the same ZIP file on the filesystem, share a single instance of `Source` (after 
the `Source` has been constructed and cached). Although `ZipFile` instances 
aren't expected to be thread-safe, the fact that multiple different instances 
of `ZipFile` could be sharing the same instance of `Source` in concurrent 
threads, mandates that the `Source` must be thread-safe.

In Java 15, we did a performance optimization through 
https://bugs.openjdk.org/browse/JDK-8243469. As part of that change, we started 
holding on to the `ZipCoder` instance (corresponding to the `Charset` provided 
during `ZipFile` construction) in the `Source`. This stored `ZipCoder` was then 
used for `ZipFile` operations when working with the ZIP entries. As noted 
previously, any non-UTF8 `ZipCoder` is not thread-safe and as a result, any 
usages of `ZipCoder` in the `Source` makes `Source` not thread-safe too. That 
effectively violates the requirement that `Source` must be thread-safe to allow 
for its usage in multiple different `ZipFile` instances concurrently. This then 
causes `ZipFile` usages to fail in unexpected ways like the one shown in the 
linked https://bugs.openjdk.org/browse/JDK-8347712.

The commit in this PR addresses the issue by not maintaining `ZipCoder` as a 
instance field of `Source`. Instead the `ZipCoder` is now maintained in the 
`ZipFile`, thus allowing a separate `ZipCoder` instance per `ZipFile`. The 
`Source` when being constructed and when parsing the entries, will use a method 
local `ZipCoder` instance as and when needed.

The change that was done in Java 15 was a performance optimization. I have 
verified that the current proposed change in this PR doesn't degrade the 
performance and yet fixes the reported issue. A new jtreg test has been 
introduced to reproduce the issue and verify the fix. Local testing with this 
change passed. tier testing is currently in progress.

-------------

Commit messages:
 - 8347712: IllegalStateException on multithreaded ZipFile access with non-UTF8 
charset

Changes: https://git.openjdk.org/jdk/pull/23986/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23986&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8347712
  Stats: 240 lines in 3 files changed: 188 ins; 13 del; 39 mod
  Patch: https://git.openjdk.org/jdk/pull/23986.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/23986/head:pull/23986

PR: https://git.openjdk.org/jdk/pull/23986

Reply via email to