On Thu, 13 Mar 2025 11:03:00 GMT, Jaikiran Pai <j...@openjdk.org> wrote:
>> Hello Jaikiran, >> >> I'm not convinced dropping `zipCoderForPos` is a step forward: >> >> * `zipCoderForPos` included a short-circuit optimization logic which skipped >> reading the CENFLG field in the case where the opening charset was UTF-8 >> (which it commonly is, and always for JarFile). This optimization seems lost >> in the currrent state of this PR and could impact lookup performance. It >> would be strange to repeat this optimization at all call sites. >> * The only thing differing between callsites seems to be which ZipCoder to >> use as a "fallback" when it's not UTF-8. This ZipCoder instance can easilly >> be passed in as a parameter, and will de-duplicate logic at the call sites. >> * The only "cumbersome" call site seems to be `initCEN`, caused by the lazy >> initialization of the non-UTF-8 ZipCoder. But that lazy initialization seems >> questionable: First, ZipCoder is already lazy in the way it initializes >> encoders and decoders. An unused ZipCoder is just a thin wrapper around a >> Charset and should not incur significant cost until it gets used. Second, >> the `ZipFile` constructor actually constructs a `ZipCoder` instance, would >> it not be better to just pass this down to initCEN as a parameter? That >> would avoid the cost of initializing encoders and decoders twice for the >> common case of single-threaded `ZipFile` access, once for initCEN, then >> again on the first lookup. >> >> Here's a patch for a quick implementation of the above on top of your PR. >> (Code speeks better than words some times :-) >> >> This retains the CENFLG optimization, simplifies logic at call sites and >> prevents duplicated initialization of ZipCoder beteween initCEN and lookups: >> >> >> Index: src/java.base/share/classes/java/util/zip/ZipFile.java >> IDEA additional info: >> Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP >> <+>UTF-8 >> =================================================================== >> diff --git a/src/java.base/share/classes/java/util/zip/ZipFile.java >> b/src/java.base/share/classes/java/util/zip/ZipFile.java >> --- a/src/java.base/share/classes/java/util/zip/ZipFile.java (revision >> 935b04ed00522aa92105292baa0693c55b1356ae) >> +++ b/src/java.base/share/classes/java/util/zip/ZipFile.java (date >> 1741800197915) >> @@ -201,8 +201,8 @@ >> this.fileName = file.getName(); >> long t0 = System.nanoTime(); >> >> - this.res = new CleanableResource(this, charset, file, mode); >> this.zipCoder = ZipCoder.get(charset); >> + this.res = new CleanableResource(this... > > Hello Eirik, > >> zipCoderForPos included a short-circuit optimization logic which skipped >> reading the CENFLG field in the case where the opening charset was UTF-8 >> (which it commonly is, and always for JarFile). This optimization seems lost >> in the currrent state of this PR > > That's a good point. I'll update this PR shortly, borrowing from your > proposed diff/patch. Hello Eirik, I have now updated the PR to borrow a major part of the diff that you provided. This now brings back the `zipCoderFor()` method and at the same time, we don't end up holding on to the `ZipCoder` in the the `Source` class, which is expected to be thread safe. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23986#discussion_r2003488113