On Thu, 21 Jul 2022 17:45:39 GMT, Mark Reinhold <m...@openjdk.org> wrote:
>> Volker Simonis has refreshed the contents of this pull request, and previous >> commits have been removed. Incremental views are not available. > > First, I’ll observe that there is no specification conflict here. The > specifications of `InflaterInputStream::read(byte[] b, int off, int len)` and > `Inflater::inflate(byte[] b, int off, int len)` are not in conflict at all. > The latter is simply weaker than the former. > > What is true is that if you build the JDK in the default manner, without any > patches, then what you wind up with is an `InflaterInputStream::read(...)` > implementation that satisfies the inherited `InputStream::read(...)` > specification, including the constraint that a read operation must not > _scribble_ in the caller’s output buffer, _i.e._, modify any byte in that > buffer beyond the returned byte count. (That constraint is, so far as we > know, satisfied by every `InputStream` subclass defined in the JDK.) > > A default JDK build’s `InflaterInputStream::read(...)` method doesn’t > scribble because > > - The implementation of `InflaterInputStream::read(...)` assumes that the > implementation of `Inflater::inflate(...)` doesn’t scribble, and > > - The implementation of `Inflater::inflate(...)` satisfies that assumption > because it invokes the corresponding function in the [standard native Zip > implementation](https://zlib.net/), which itself doesn’t scribble. > > So, not only does the specification forbid `InflaterInputStream::read(...)` > from scribbling, but in the official Java SE Reference Implementation — which > is simply a default JDK build — that method does not scribble. (Perhaps by > accident, but that is immaterial here.) > > Therefore, if in your own JDK build you substitute an alternative Zip > implementation whose `inflate` function does scribble then you wind up with a > JDK that both violates the specification and behaves differently than the RI. > It is incompatible. If you use such a JDK in production then you risk > breaking existing code. > > Now, compatibility is one of the key values of the Java Platform, but > performance is not unimportant. Is there something we can do to accommodate > [alternative, faster, scribbling Zip implementations][faster-zlib]? > > @simonis lists three options in the [issue]. I’ll cast the options somewhat > differently. > > 1. Do not change the specification in any way. The specification text that > forbids scribbling dates back to Java 1.1, released over 25 years ago. > (Arguably that text over-specifies the `InputStream::read(...)` method, but > that is also immaterial.) It’s all too possible that existing code — no > matter how odd or poorly written we think it might be — depends upon this > constraint. > > With this option, a JDK build that uses a scribbling Zip implementation > would have to buffer that implementation’s output and copy only the returned > number of bytes into the caller’s buffer. This would degrade any performance > benefit. > > 2. Weaken the specification of `InflaterInputStream` — and, thereby, that > of its subclasses `GZipInputStream` and `ZipInputStream` — to allow > scribbling. This is essentially @simonis’s initial suggestion, as embodied > in [63ae357] earlier in this PR. > > This option risks breaking existing code that depends upon the > 25-year-old promise that `InflaterInputStream::read(...)` will not scribble. > Changing the specification of an indirect subclass of `InputStream`, > moreover, is apt to go unnoticed by many developers, who routinely deal with > instances of `InputStream` without any knowledge of those instances’ concrete > classes. (@jaikiran gives one example above.) > > 3. Weaken the specification of the `InputStream::read(...)` method to allow > scribbling, as [suggested by @jddarcy in the CSR][jddarcy]. > > This option is at least as risky to existing code as the second option, > though it’s marginally more likely that developers writing new code will > notice the specification change. > > What’s worse, however, is that this option gives developers writing new > subclasses of `InputStream` permission to deviate from both that class’s > longstanding specification and the actual behavior of every `InputStream` > subclass defined in the JDK. In the long term, that could wreak subtle bugs > throughout the ecosystem, even breaking code that does absolutely nothing > with Zip-related input streams. > > In favor of both the second and third options is @simonis’s statement above > that his employer (Amazon) has used JDK builds with a scribbling Zip > implementation in production for two years. He has further elaborated > (privately) that this includes hundreds of services built with many of the > usual popular frameworks and libraries (Spring, Hibernate, ASM, etc.). This > suggests that the practical impact, out in the wild, of allowing scribbling > Zip implementations is acceptable. > > On that basis I am — just barely — comfortable with the second option, > _i.e._, weakening the specification of `InflaterInputStream` to allow > scribbling, thereby contradicting the specification of its `InputStream` > superclass. > > Explicitly specifying a subclass in a way that’s inconsistent with the > specification of one of its superclasses is generally a really bad idea. It > violates the [Liskov Substitution Principle][lsp], which enables instances of > a subclass to be substituted freely for instances of one of its superclasses. > Thus we do not condone such changes lightly. There are, however, precedents > in a handful of special cases in the Java Platform; _e.g._, in the > [`java.util.IdentityHashMap`][ihm] and [`java.sql.Timestamp`][ts] classes. > > The third option above would not violate the Liskov substitution principle, > but in my view could prove more harmful in the long run. > > Recommendations: > > - Revert this PR back to [63ae357], and make corresponding updates to the > [issue] and the [CSR]. > > - Additionally, find every method in every `java.*` class that’s declared > to return an `InputStream` (_e.g._, `ZipFile::getInputStream(...)` but > actually returns an instance of `InflaterInputStream` or one of its > subclasses. Add an API note to each such method to let developers know that > the returned input stream might scribble, per @jaikiran’s suggestion above. > These notes can all use the same text, linking to the revised > `InflaterInputStream::read(...)` specification. > > - Revise the summary of this PR, the issue, and the CSR to reflect the true > nature of the proposed change rather than assert a nonexistent conflict. > Consider “Weaken the `InflaterInputStream` specification in order to allow > faster Zip implementations”. > > - Change the issue type from a bug — which it’s not — to an enhancement. > > - Create a related release-note issue, as @RogerRiggs suggests above. > > > [faster-zlib]: > https://github.com/simonis/zlib-chromium#inflater-improvements-in-zlib-chromium > [issue]: https://bugs.openjdk.org/browse/JDK-8282648 > [63ae357]: > https://github.com/openjdk/jdk/pull/7986/commits/63ae3572da674a0926bb9d0adcea88d89bb56707 > [jddarcy]: > https://bugs.openjdk.org/browse/JDK-8283758?focusedCommentId=14492930&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14492930 > [CSR]: https://bugs.openjdk.org/browse/JDK-8283758 > [lsp]: https://en.wikipedia.org/wiki/Liskov_substitution_principle > [ihm]: > https://docs.oracle.com/en/java/javase/18/docs/api/java.base/java/util/IdentityHashMap.html > [ts]: > https://docs.oracle.com/en/java/javase/18/docs/api/java.sql/java/sql/Timestamp.html @mbreinhold, @LanceAndersen thanks for your reviews. I'm waiting for the CSR approval before pushing. ------------- PR: https://git.openjdk.org/jdk/pull/7986