On Wed, 19 Mar 2025 14:37:33 GMT, Jaikiran Pai <j...@openjdk.org> wrote:

>> Can I please get a review of this change which proposes to fix an issue 
>> `java.util.zip.ZipFile` which would cause failures when multiple instances 
>> of `ZipFile` using non-UTF8 `Charset` were operating against the same 
>> underlying ZIP file? This addresses 
>> https://bugs.openjdk.org/browse/JDK-8347712.
>> 
>> ZIP file specification allows for ZIP entries to mark a `UTF-8` flag to 
>> indicate that the entry name and comment are encoded using UTF8. A 
>> `java.util.zip.ZipFile` can be constructed by passing it a `Charset`. This 
>> `Charset` (which defaults to UTF-8) gets used for decoding entry names and 
>> comments for non-UTF8 entries.
>> 
>> The internal implementation of `ZipFile` uses a `ZipCoder` (backed by 
>> `java.nio.charset.CharsetEncoder/CharsetDecoder` instance) for the given 
>> `Charset`. Except for UTF8 `ZipCoder`, other `ZipCoder`s are not thread safe.
>> 
>> The internal implementation of `ZipFile` maintains a cache of 
>> `ZipFile$Source`. A `Source` corresponds to the underlying ZIP file and 
>> during construction, uses a `ZipCoder` for parsing the ZIP entries and once 
>> constructed holds on to the parsed ZIP structure. Multiple instances of a 
>> `ZipFile` which all correspond to the same ZIP file on the filesystem, share 
>> a single instance of `Source` (after the `Source` has been constructed and 
>> cached). Although `ZipFile` instances aren't expected to be thread-safe, the 
>> fact that multiple different instances of `ZipFile` could be sharing the 
>> same instance of `Source` in concurrent threads, mandates that the `Source` 
>> must be thread-safe.
>> 
>> In Java 15, we did a performance optimization through 
>> https://bugs.openjdk.org/browse/JDK-8243469. As part of that change, we 
>> started holding on to the `ZipCoder` instance (corresponding to the 
>> `Charset` provided during `ZipFile` construction) in the `Source`. This 
>> stored `ZipCoder` was then used for `ZipFile` operations when working with 
>> the ZIP entries. As noted previously, any non-UTF8 `ZipCoder` is not 
>> thread-safe and as a result, any usages of `ZipCoder` in the `Source` makes 
>> `Source` not thread-safe too. That effectively violates the requirement that 
>> `Source` must be thread-safe to allow for its usage in multiple different 
>> `ZipFile` instances concurrently. This then causes `ZipFile` usages to fail 
>> in unexpected ways like the one shown in the linked 
>> https://bugs.openjdk.org/browse/JDK-8347712.
>> 
>> The commit in this PR addresses the issue by not maintaining `ZipCoder` as a 
>> instance field of `Source`. Instead the `ZipCoder` is now mainta...
>
> Jaikiran Pai has updated the pull request with a new target base due to a 
> merge or a rebase. The incremental webrev excludes the unrelated changes 
> brought in by the merge/rebase. The pull request contains nine additional 
> commits since the last revision:
> 
>  - Eirik's suggestion - update test method comment
>  - rename entryNameCharset to charset in the test
>  - Eirik's suggestion - update test to even call ZipFile.getEntry()
>  - Eirik's inputs - replace useUTF8Coder() with zipCoderFor()
>  - merge latest from master branch
>  - add code comment
>  - rename field
>  - tiny typo fix in newly introduced documentation
>  - 8347712: IllegalStateException on multithreaded ZipFile access with 
> non-UTF8 charset

src/java.base/share/classes/java/util/zip/ZipCoder.java line 47:

> 45:  * and can be used concurrently by multiple threads. All other {@code 
> ZipCoder} instances
> 46:  * are not thread-safe and external synchronization is required by 
> callers, if those
> 47:  * instances are to be used concurrently by multiple threads.

I think you can trim this down to just say that the UTF-8 ZipCoder is thread 
safe, ZipCoder for other charsets require synchronization.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/23986#discussion_r2008822662

Reply via email to