Re: RFR: 8347712: IllegalStateException on multithreaded ZipFile access with non-UTF8 charset [v2]

Jaikiran Pai Thu, 13 Mar 2025 04:06:42 -0700

On Wed, 12 Mar 2025 18:22:57 GMT, Eirik Bjørsnøs <eir...@openjdk.org> wrote:


>>> This seems mostly used to determine which ZipCoder to pick. Could we fold 
>>> this into the method, calling it zipCoderForPos and make it just return the 
>>> ZipCoder, like we had in Source?
>> 
>> I intentionally removed the `zipCoderForPos()` method and instead introduced 
>> this static method to avoid the case where the code ends up calling this 
>> method and then stores the returned `ZipCoder` (like was being done in 
>> `Source`). Instead with this new method, it now allows the call sites to 
>> determine if a UTF8 `ZipCoder` is needed or a non-UTF8 one and then decide 
>> where to get the non-UTF8 `ZipCoder` from. If the call site is a instance 
>> method in `ZipFile`, then it was use the `ZipFile`'s `zipCoder` field and if 
>> the call site is within `Source`, when the `Source` is being instantiated 
>> then it constructs a non-UTF8 `ZipCoder` afresh. That level of detail is 
>> better dealt at the call site instead of a method like `zipCoderForPos()`.
>
> Hello Jaikiran,
> 
> I'm not convinced dropping `zipCoderForPos` is a step forward:
> 
> * `zipCoderForPos` included a short-circuit optimization logic which skipped 
> reading the CENFLG field in the case where the opening charset was UTF-8 
> (which it commonly is, and always for JarFile). This optimization seems lost 
> in the currrent state of this PR and could impact lookup performance. It 
> would be strange to repeat this optimization at all call sites.
> * The only thing differing between callsites seems to be which ZipCoder to 
> use as a "fallback" when it's not UTF-8. This ZipCoder instance can easilly 
> be passed in as a parameter, and will de-duplicate logic at the call sites.
> * The only "cumbersome" call site seems to be `initCEN`, caused by the lazy 
> initialization of the non-UTF-8 ZipCoder. But that lazy initialization seems 
> questionable: First, ZipCoder is already lazy in the way it initializes 
> encoders and decoders. An unused ZipCoder is just a thin wrapper around a 
> Charset and should not incur significant cost until it gets used. Second, the 
> `ZipFile` constructor actually constructs a `ZipCoder` instance, would it not 
> be better to just pass this down to initCEN as a parameter? That would avoid 
> the cost of initializing encoders and decoders twice for the common case of 
> single-threaded `ZipFile` access, once for initCEN, then again on the first 
> lookup.  
> 
> Here's a patch for a quick implementation of the above on top of your PR. 
> (Code speeks better than words some times :-)
> 
> This retains the CENFLG optimization, simplifies logic at call sites and 
> prevents duplicated initialization of ZipCoder beteween initCEN and lookups:
> 
> 
> Index: src/java.base/share/classes/java/util/zip/ZipFile.java
> IDEA additional info:
> Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
> <+>UTF-8
> ===================================================================
> diff --git a/src/java.base/share/classes/java/util/zip/ZipFile.java 
> b/src/java.base/share/classes/java/util/zip/ZipFile.java
> --- a/src/java.base/share/classes/java/util/zip/ZipFile.java  (revision 
> 935b04ed00522aa92105292baa0693c55b1356ae)
> +++ b/src/java.base/share/classes/java/util/zip/ZipFile.java  (date 
> 1741800197915)
> @@ -201,8 +201,8 @@
>          this.fileName = file.getName();
>          long t0 = System.nanoTime();
>  
> -        this.res = new CleanableResource(this, charset, file, mode);
>          this.zipCoder = ZipCoder.get(charset);
> +        this.res = new CleanableResource(this, charset, zipCoder, file, 
> mode);
>  
>          PerfCounter...

Hello Eirik,

> zipCoderForPos included a short-circuit optimization logic which skipped 
> reading the CENFLG field in the case where the opening charset was UTF-8 
> (which it commonly is, and always for JarFile). This optimization seems lost 
> in the currrent state of this PR

That's a good point. I'll update this PR shortly, borrowing from your proposed 
diff/patch.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/23986#discussion_r1993283080

Re: RFR: 8347712: IllegalStateException on multithreaded ZipFile access with non-UTF8 charset [v2]

Reply via email to