Can I please get a review of this change which proposes to fix a file
descriptor leak in `sun.net.www.protocol.file.FileURLConnection`?
The bug affects all JDK versions since Java 8 through mainline JDK.
Furthermore, the issue isn't in any way related to "jar:" URLs or the JAR
resources caching involved in the `JarURLConnection`.
`sun.net.www.protocol.file.FileURLConnection` is a `java.net.URLConnection`.
For this issue, the APIs on `URLConnection` that are of interest are
`connect()`, header related APIs (the getHeaderXXXX(), getLastModified() and
similar APIs) and `getInputStream()`.
The existing implementation in `FileURLConnection` in its `connect()`
implementation does readability checks on the underlying `File` instance. If
the `File` is a directory then the readability check is done by verifying that
a `File.list()` call does not return null. On the other hand, if the File is
not a directory, then connect() constructs a temporary
`java.io.FileInputStream` for the File and lets the FileInputStream's
constructor implementation do the necessary readability checks. In either case,
if the readability checks fail, then an IOException is thrown from this method
and the FileURLConnection stays unconnected.
One important detail of the implementation in `FileURLConnection.connect()` is
that the `FileInputStream` that it creates for the regular-file readability
check, it keeps it open when it returns from `connect()`. `connect()` itself
doesn't return any value, so this `FileInputStream` that was created for
readability check remains open without the application having access to it,
unless the application calls `(File)URLConnection.getInputStream()`. The
implementation of `FileURLConnection.getInputStream()` returns this previously
constructed `InputStream` if `connect()` had previously suceeded with its
readability checks. The application, as usual, is then expected to `close()`
that `InputStream` that was returned from `URLConnection.getInputStream()`.
This is the "normal" case, where the application at some point calls the
`URLConnection.getInputStream()` and closes that stream.
As noted previously, `URLConnection` has a few other APIs, for example the APIs
that provide header values. An example:
Path regularFile = Path.of("hello.txt");
URLConnection conn = regularFile.toUri().toURL().openConnection();
// either of the following header related APIs
long val = conn.getLastModified();
String val = conn.getHeaderField("foobar");
// ... some other header related APIs
The issue/leak arises in these cases. Application code constructs a
`URLConnection` and invokes these trivial APIs and neither does an explicit
`connect()` nor calls `getInputStream()`, because it doesn't have to. In code
like this, the implementation of `FileURLConnection` internally in the
implementation of `getLastModified()` and other similar APIs, does a
`connect()`. As explained above, the implementation of `connect()` then creates
and leaves around a `FileInputStream` after doing readability checks. In this
above application code, the `InputStream` never reaches the application code
nor does it closed, thus leading to the leak.
The specification of `URLConnection` states:
> The actual connection to the remote object is made, using the {@link
> #connect() connect} method.
> The remote object becomes available. The header fields and the contents of
> the remote object can be accessed.
> ...
> Invoking the {@code close()} methods on the {@code InputStream} or {@code
> OutputStream} of an
> {@code URLConnection} after a request may free network resources associated
> with this
> instance, unless particular protocol specifications specify different
> behaviours
> for it.
It can be argued that it seems to expect that the application call
`URLConnection.getInputStream()` and then close() it to close the resources,
even in the case where it isn't interested in the `InputStream` or its
contents. But, I think, it's a bit odd to be expecting applications to be doing
that, if at all that's what is expected. So the change in this PR proposes to
address the leak that can arise when the application never calls
`URLConnection.getInputStream()`.
The change in this PR, keeps the readability checks in the `connect()` method.
However, the `FileInputStream` that is opened to do the readability check is
closed before returning from the `connect()` method. That way, there is no
`InputStream` lying around. When/if the application calls
`URLConnection.getInputStream()` then the implementation in that method has
been updated to construct and return the `FileInputStream`. This does mean that
there's a change in behaviour for the following case (I consider it a corner
case):
- URLConnection is constructed for a "foo.txt" file which is readable
- URLConnection.getLastModified() (or similar header related APIs) is invoked
on that URLConnection. Internally that results in a connect() and file
readability check and that succeeds.
- Behind the scenes on the filesystem (or from some other part of the
application), "foo.txt" file's permissions are modified to remove "read"
permission.
- URLConnection.getInputStream() is invoked on that previously constructed (and
connected) URLConnection.
Before the changes in this PR, the above scenario would result in the
`URLConnection.getInputStream()` returning normally a `InputStream` instance
(that was cached during the connect() call). The application would then be able
to normally/successfully read content from that `InputStream`. With the
proposed change in this PR, the `URLConnection.getInputStream()` call will
throw a `FileNotFoundException`, when it attempts to construct the
`FileInputStream` and noticing that the file doesn't have "read" permission.
`URLConnection.getInputStream()` is specified to throw an `IOException`, so the
exception (and its type) itself isn't a problem. It however is still a change
in behaviour. Having said that, my opinion is that changing file permissions in
between API calls on a `URLConnection` is a corner case and I think it might be
OK if there is a change in behaviour here (along with a release note maybe),
especially since I think the proposed change prevents a leak in a "normal"
usage of t
he `URLConnection` APIs in the application code (and also helps the internal
implementation in `FileURLConnection` to have a more clearly demarcated
lifetime for the `InputStream` instance).
Additional care has been taken to make sure that no other change in behaviour
is introduced. 2 new jtreg tests have been introduced. The
`FileURLConnStreamLeakTest` is a regression test which reproduces the leak and
verifies the fix - several test methods in this test will fail on Windows
without the fix and will pass with the fix. The `GetInputStreamTest` is a new
test which has been introduced to help improve the coverage of testing for the
`InputStream` returned from the `FileURLConnection`.
tier1, tier2 and tier3 tests pass with this change.
-------------
Commit messages:
- 8367561: fix inputstream leak
- 8367561: introduce tests
Changes: https://git.openjdk.org/jdk/pull/27633/files
Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27633&range=00
Issue: https://bugs.openjdk.org/browse/JDK-8367561
Stats: 515 lines in 4 files changed: 483 ins; 11 del; 21 mod
Patch: https://git.openjdk.org/jdk/pull/27633.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/27633/head:pull/27633
PR: https://git.openjdk.org/jdk/pull/27633