Thank you Chen! Filed https://bugs.openjdk.org/browse/JDK-8376842 to track this enhancement.
Eirik. On Thu, Jan 29, 2026 at 7:27 PM Chen Liang <[email protected]> wrote: > Hello Eirik, > I strongly agree with your proposal. I see such a change has low risk > given ZipCoder is an internal class. > > Regards, > Chen > > ------------------------------ > *From:* core-libs-dev <[email protected]> on behalf of Eirik > Bjørsnøs <[email protected]> > *Sent:* Wednesday, January 28, 2026 3:26 AM > *To:* core-libs-dev <[email protected]> > *Subject:* RFD: Reorganize ZipCoder such that UTF8 is handled by the base > class > > Hi, > > Bringing this up on core-libs-dev such that the motivation can be > explained/discussed here and any future PR can focus on actual code changes. > > Summary: > > Reorganize the ZipCoder class hierarchy to let the base class handle UTF8 > and the subclass handle arbitrary Charsets. This makes the design better > match the ZIP specification and how ZIP files are used in the real world > and additionally have some benefits in code quality and performance. > > Motivation: > > The ZipCoder class has been central to many ZipFile performance > improvements in recent years. Many optimizations are encoding-specific and > encapsulating these concerns makes a lot of sense. > > Currently, the base ZipCoder instance supports any given Charset. Then, a > subclass UTF8ZipCoder provides higher performance optimizations specific to > UTF-8. > > However, real-world use of the ZipFile API defaults to UTF-8. The ZIP > specification long-ago introduced a flag to explicitly indicate that > entries are encoded using UTF-8. The JAR specification has mandated UTF-8 > since the beginning. Any use of non-UTF-8 ZIP files is increasingly niche > and belongs in the legacy zone. > > The current UTF8ZipCoder is stateless and documented as thread safe, while > the base class ZipCoder is not. As a subclass of ZipCode, UTF8ZipCoder does > however inherit CharsetEncoder and CharsetDecoder state fields from its > super class and it needs to pass a UTF8 Charset to its parent, without > really using it. This makes state and thread safety harder to reason about. > > Since UTF8ZipCoder is always needed, the JVM must always load it along > with the base class ZipCoder. Apart from loading an extra class, this > prevents the JVM from seeing calls to ZipCoder methods as monomorphic. > > A draft implementation of this change indicates a ~3% performance win on > ZipFile lookups in ZipFileGetEntry, probably explained by the compiler > seeing only one instance of ZipCoder being loaded. > > Solution: > > Switch the class hierarchy of ZipCoder around such that the base class > handles UTF-8. Introduce a new subclass CharsetZipCoder to handle legacy > non-UTF ZIP files. Move the Charset, CharsetEncoder, CharsetDecoder fields > to this subclass. Update code comments to reflect the changes. > > Risks: > > This should be a pure refactoring, mostly moving code around. Most changes > can be performed in-place, such that side by side review will mostly > reflect indentation changes. We have good test coverage for UTF8 and > non-UTF-8 ZIP files to help us catch regressions. > > If I see support for this proposal, I'll be happy to submit a PR with the > actual changes. > > Cheers, > Eirik :-) > > > > > > > > > > Confidential- Oracle Internal >
