On Wed, 12 Mar 2025 18:22:57 GMT, Eirik Bjørsnøs <eir...@openjdk.org> wrote:
>>> This seems mostly used to determine which ZipCoder to pick. Could we fold >>> this into the method, calling it zipCoderForPos and make it just return the >>> ZipCoder, like we had in Source? >> >> I intentionally removed the `zipCoderForPos()` method and instead introduced >> this static method to avoid the case where the code ends up calling this >> method and then stores the returned `ZipCoder` (like was being done in >> `Source`). Instead with this new method, it now allows the call sites to >> determine if a UTF8 `ZipCoder` is needed or a non-UTF8 one and then decide >> where to get the non-UTF8 `ZipCoder` from. If the call site is a instance >> method in `ZipFile`, then it was use the `ZipFile`'s `zipCoder` field and if >> the call site is within `Source`, when the `Source` is being instantiated >> then it constructs a non-UTF8 `ZipCoder` afresh. That level of detail is >> better dealt at the call site instead of a method like `zipCoderForPos()`. > > Hello Jaikiran, > > I'm not convinced dropping `zipCoderForPos` is a step forward: > > * `zipCoderForPos` included a short-circuit optimization logic which skipped > reading the CENFLG field in the case where the opening charset was UTF-8 > (which it commonly is, and always for JarFile). This optimization seems lost > in the currrent state of this PR and could impact lookup performance. It > would be strange to repeat this optimization at all call sites. > * The only thing differing between callsites seems to be which ZipCoder to > use as a "fallback" when it's not UTF-8. This ZipCoder instance can easilly > be passed in as a parameter, and will de-duplicate logic at the call sites. > * The only "cumbersome" call site seems to be `initCEN`, caused by the lazy > initialization of the non-UTF-8 ZipCoder. But that lazy initialization seems > questionable: First, ZipCoder is already lazy in the way it initializes > encoders and decoders. An unused ZipCoder is just a thin wrapper around a > Charset and should not incur significant cost until it gets used. Second, the > `ZipFile` constructor actually constructs a `ZipCoder` instance, would it not > be better to just pass this down to initCEN as a parameter? That would avoid > the cost of initializing encoders and decoders twice for the common case of > single-threaded `ZipFile` access, once for initCEN, then again on the first > lookup. > > Here's a patch for a quick implementation of the above on top of your PR. > (Code speeks better than words some times :-) > > This retains the CENFLG optimization, simplifies logic at call sites and > prevents duplicated initialization of ZipCoder beteween initCEN and lookups: > > > Index: src/java.base/share/classes/java/util/zip/ZipFile.java > IDEA additional info: > Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP > <+>UTF-8 > =================================================================== > diff --git a/src/java.base/share/classes/java/util/zip/ZipFile.java > b/src/java.base/share/classes/java/util/zip/ZipFile.java > --- a/src/java.base/share/classes/java/util/zip/ZipFile.java (revision > 935b04ed00522aa92105292baa0693c55b1356ae) > +++ b/src/java.base/share/classes/java/util/zip/ZipFile.java (date > 1741800197915) > @@ -201,8 +201,8 @@ > this.fileName = file.getName(); > long t0 = System.nanoTime(); > > - this.res = new CleanableResource(this, charset, file, mode); > this.zipCoder = ZipCoder.get(charset); > + this.res = new CleanableResource(this, charset, zipCoder, file, > mode); > > PerfCounter... Hello Eirik, > zipCoderForPos included a short-circuit optimization logic which skipped > reading the CENFLG field in the case where the opening charset was UTF-8 > (which it commonly is, and always for JarFile). This optimization seems lost > in the currrent state of this PR That's a good point. I'll update this PR shortly, borrowing from your proposed diff/patch. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23986#discussion_r1993283080