Re: RFR: 8347712: IllegalStateException on multithreaded ZipFile access with non-UTF8 charset [v2]

Jaikiran Pai Wed, 19 Mar 2025 07:38:05 -0700

On Thu, 13 Mar 2025 11:03:00 GMT, Jaikiran Pai <j...@openjdk.org> wrote:


>> Hello Jaikiran,
>> 
>> I'm not convinced dropping `zipCoderForPos` is a step forward:
>> 
>> * `zipCoderForPos` included a short-circuit optimization logic which skipped 
>> reading the CENFLG field in the case where the opening charset was UTF-8 
>> (which it commonly is, and always for JarFile). This optimization seems lost 
>> in the currrent state of this PR and could impact lookup performance. It 
>> would be strange to repeat this optimization at all call sites.
>> * The only thing differing between callsites seems to be which ZipCoder to 
>> use as a "fallback" when it's not UTF-8. This ZipCoder instance can easilly 
>> be passed in as a parameter, and will de-duplicate logic at the call sites.
>> * The only "cumbersome" call site seems to be `initCEN`, caused by the lazy 
>> initialization of the non-UTF-8 ZipCoder. But that lazy initialization seems 
>> questionable: First, ZipCoder is already lazy in the way it initializes 
>> encoders and decoders. An unused ZipCoder is just a thin wrapper around a 
>> Charset and should not incur significant cost until it gets used. Second, 
>> the `ZipFile` constructor actually constructs a `ZipCoder` instance, would 
>> it not be better to just pass this down to initCEN as a parameter? That 
>> would avoid the cost of initializing encoders and decoders twice for the 
>> common case of single-threaded `ZipFile` access, once for initCEN, then 
>> again on the first lookup.  
>> 
>> Here's a patch for a quick implementation of the above on top of your PR. 
>> (Code speeks better than words some times :-)
>> 
>> This retains the CENFLG optimization, simplifies logic at call sites and 
>> prevents duplicated initialization of ZipCoder beteween initCEN and lookups:
>> 
>> 
>> Index: src/java.base/share/classes/java/util/zip/ZipFile.java
>> IDEA additional info:
>> Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
>> <+>UTF-8
>> ===================================================================
>> diff --git a/src/java.base/share/classes/java/util/zip/ZipFile.java 
>> b/src/java.base/share/classes/java/util/zip/ZipFile.java
>> --- a/src/java.base/share/classes/java/util/zip/ZipFile.java (revision 
>> 935b04ed00522aa92105292baa0693c55b1356ae)
>> +++ b/src/java.base/share/classes/java/util/zip/ZipFile.java (date 
>> 1741800197915)
>> @@ -201,8 +201,8 @@
>>          this.fileName = file.getName();
>>          long t0 = System.nanoTime();
>>  
>> -        this.res = new CleanableResource(this, charset, file, mode);
>>          this.zipCoder = ZipCoder.get(charset);
>> +        this.res = new CleanableResource(this...
>
> Hello Eirik,
> 
>> zipCoderForPos included a short-circuit optimization logic which skipped 
>> reading the CENFLG field in the case where the opening charset was UTF-8 
>> (which it commonly is, and always for JarFile). This optimization seems lost 
>> in the currrent state of this PR
> 
> That's a good point. I'll update this PR shortly, borrowing from your 
> proposed diff/patch.

Hello Eirik, I have now updated the PR to borrow a major part of the diff that 
you provided. This now brings back the `zipCoderFor()` method and at the same 
time, we don't end up holding on to the `ZipCoder` in the the `Source` class, 
which is expected to be thread safe.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/23986#discussion_r2003488113

Re: RFR: 8347712: IllegalStateException on multithreaded ZipFile access with non-UTF8 charset [v2]

Reply via email to