> Can I please get a review of this change which proposes to fix an issue > `java.util.zip.ZipFile` which would cause failures when multiple instances of > `ZipFile` using non-UTF8 `Charset` were operating against the same underlying > ZIP file? This addresses https://bugs.openjdk.org/browse/JDK-8347712. > > ZIP file specification allows for ZIP entries to mark a `UTF-8` flag to > indicate that the entry name and comment are encoded using UTF8. A > `java.util.zip.ZipFile` can be constructed by passing it a `Charset`. This > `Charset` (which defaults to UTF-8) gets used for decoding entry names and > comments for non-UTF8 entries. > > The internal implementation of `ZipFile` uses a `ZipCoder` (backed by > `java.nio.charset.CharsetEncoder/CharsetDecoder` instance) for the given > `Charset`. Except for UTF8 `ZipCoder`, other `ZipCoder`s are not thread safe. > > The internal implementation of `ZipFile` maintains a cache of > `ZipFile$Source`. A `Source` corresponds to the underlying ZIP file and > during construction, uses a `ZipCoder` for parsing the ZIP entries and once > constructed holds on to the parsed ZIP structure. Multiple instances of a > `ZipFile` which all correspond to the same ZIP file on the filesystem, share > a single instance of `Source` (after the `Source` has been constructed and > cached). Although `ZipFile` instances aren't expected to be thread-safe, the > fact that multiple different instances of `ZipFile` could be sharing the same > instance of `Source` in concurrent threads, mandates that the `Source` must > be thread-safe. > > In Java 15, we did a performance optimization through > https://bugs.openjdk.org/browse/JDK-8243469. As part of that change, we > started holding on to the `ZipCoder` instance (corresponding to the `Charset` > provided during `ZipFile` construction) in the `Source`. This stored > `ZipCoder` was then used for `ZipFile` operations when working with the ZIP > entries. As noted previously, any non-UTF8 `ZipCoder` is not thread-safe and > as a result, any usages of `ZipCoder` in the `Source` makes `Source` not > thread-safe too. That effectively violates the requirement that `Source` must > be thread-safe to allow for its usage in multiple different `ZipFile` > instances concurrently. This then causes `ZipFile` usages to fail in > unexpected ways like the one shown in the linked > https://bugs.openjdk.org/browse/JDK-8347712. > > The commit in this PR addresses the issue by not maintaining `ZipCoder` as a > instance field of `Source`. Instead the `ZipCoder` is now maintained in the > `ZipFile`,...
Jaikiran Pai has updated the pull request incrementally with one additional commit since the last revision: Lance's review - update code comment in the test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23986/files - new: https://git.openjdk.org/jdk/pull/23986/files/9a29b960..71cb9781 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23986&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23986&range=06-07 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23986.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23986/head:pull/23986 PR: https://git.openjdk.org/jdk/pull/23986