On Wed, 19 Mar 2025 14:37:33 GMT, Jaikiran Pai <j...@openjdk.org> wrote:

>> Can I please get a review of this change which proposes to fix an issue 
>> `java.util.zip.ZipFile` which would cause failures when multiple instances 
>> of `ZipFile` using non-UTF8 `Charset` were operating against the same 
>> underlying ZIP file? This addresses 
>> https://bugs.openjdk.org/browse/JDK-8347712.
>> 
>> ZIP file specification allows for ZIP entries to mark a `UTF-8` flag to 
>> indicate that the entry name and comment are encoded using UTF8. A 
>> `java.util.zip.ZipFile` can be constructed by passing it a `Charset`. This 
>> `Charset` (which defaults to UTF-8) gets used for decoding entry names and 
>> comments for non-UTF8 entries.
>> 
>> The internal implementation of `ZipFile` uses a `ZipCoder` (backed by 
>> `java.nio.charset.CharsetEncoder/CharsetDecoder` instance) for the given 
>> `Charset`. Except for UTF8 `ZipCoder`, other `ZipCoder`s are not thread safe.
>> 
>> The internal implementation of `ZipFile` maintains a cache of 
>> `ZipFile$Source`. A `Source` corresponds to the underlying ZIP file and 
>> during construction, uses a `ZipCoder` for parsing the ZIP entries and once 
>> constructed holds on to the parsed ZIP structure. Multiple instances of a 
>> `ZipFile` which all correspond to the same ZIP file on the filesystem, share 
>> a single instance of `Source` (after the `Source` has been constructed and 
>> cached). Although `ZipFile` instances aren't expected to be thread-safe, the 
>> fact that multiple different instances of `ZipFile` could be sharing the 
>> same instance of `Source` in concurrent threads, mandates that the `Source` 
>> must be thread-safe.
>> 
>> In Java 15, we did a performance optimization through 
>> https://bugs.openjdk.org/browse/JDK-8243469. As part of that change, we 
>> started holding on to the `ZipCoder` instance (corresponding to the 
>> `Charset` provided during `ZipFile` construction) in the `Source`. This 
>> stored `ZipCoder` was then used for `ZipFile` operations when working with 
>> the ZIP entries. As noted previously, any non-UTF8 `ZipCoder` is not 
>> thread-safe and as a result, any usages of `ZipCoder` in the `Source` makes 
>> `Source` not thread-safe too. That effectively violates the requirement that 
>> `Source` must be thread-safe to allow for its usage in multiple different 
>> `ZipFile` instances concurrently. This then causes `ZipFile` usages to fail 
>> in unexpected ways like the one shown in the linked 
>> https://bugs.openjdk.org/browse/JDK-8347712.
>> 
>> The commit in this PR addresses the issue by not maintaining `ZipCoder` as a 
>> instance field of `Source`. Instead the `ZipCoder` is now mainta...
>
> Jaikiran Pai has updated the pull request with a new target base due to a 
> merge or a rebase. The incremental webrev excludes the unrelated changes 
> brought in by the merge/rebase. The pull request contains nine additional 
> commits since the last revision:
> 
>  - Eirik's suggestion - update test method comment
>  - rename entryNameCharset to charset in the test
>  - Eirik's suggestion - update test to even call ZipFile.getEntry()
>  - Eirik's inputs - replace useUTF8Coder() with zipCoderFor()
>  - merge latest from master branch
>  - add code comment
>  - rename field
>  - tiny typo fix in newly introduced documentation
>  - 8347712: IllegalStateException on multithreaded ZipFile access with 
> non-UTF8 charset

src/java.base/share/classes/java/util/zip/ZipFile.java line 384:

> 382:      * @param fallback the fallback ZipCoder to return if the entry 
> doesn't require UTF-8
> 383:      */
> 384:     private static ZipCoder zipCoderFor(final byte[] cen, final int pos, 
> final ZipCoder fallback) {

Have you tried putting an instance method on ZipFile instead as it has access 
to the zip coder. That would give you better abstraction and avoid needing to 
introduce "fallback" as that is confusing to see here.

src/java.base/share/classes/java/util/zip/ZipFile.java line 1146:

> 1144:     static record EntryPos(String name, int pos) {}
> 1145: 
> 1146:     // Implementation note: This class must be thread-safe.

I think you mean "This class is thread safe".

src/java.base/share/classes/java/util/zip/ZipFile.java line 1149:

> 1147:     // Each instance of Source could be shared between multiple 
> different instances of ZipFile.
> 1148:     // Although ZipFile isn't thread-safe, the fact that separate 
> instances of ZipFile could
> 1149:     // still concurrently use the same Source instance, mandates that 
> Source must be thread-safe.

If you assert previously that the class is thread safe then I think you can 
drop this confusing comment.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/23986#discussion_r2008823685
PR Review Comment: https://git.openjdk.org/jdk/pull/23986#discussion_r2008823754
PR Review Comment: https://git.openjdk.org/jdk/pull/23986#discussion_r2008823982

Reply via email to