Can I please get a review of this change which proposes to fix a file 
descriptor leak in `sun.net.www.protocol.file.FileURLConnection`?

The bug affects all JDK versions since Java 8 through mainline JDK. 
Furthermore, the issue isn't in any way related to "jar:" URLs or the JAR 
resources caching involved in the `JarURLConnection`.

`sun.net.www.protocol.file.FileURLConnection` is a `java.net.URLConnection`. 
For this issue, the APIs on `URLConnection` that are of interest are 
`connect()`, header related APIs (the getHeaderXXXX(), getLastModified() and 
similar APIs) and `getInputStream()`.

The existing implementation in `FileURLConnection` in its `connect()` 
implementation does readability checks on the underlying `File` instance. If 
the `File` is a directory then the readability check is done by verifying that 
a `File.list()` call does not return null. On the other hand, if the File is 
not a directory, then connect() constructs a temporary 
`java.io.FileInputStream` for the File and lets the FileInputStream's 
constructor implementation do the necessary readability checks. In either case, 
if the readability checks fail, then an IOException is thrown from this method 
and the FileURLConnection stays unconnected.

One important detail of the implementation in `FileURLConnection.connect()` is 
that the `FileInputStream` that it creates for the regular-file readability 
check, it keeps it open when it returns from `connect()`. `connect()` itself 
doesn't return any value, so this `FileInputStream` that was created for 
readability check remains open without the application having access to it, 
unless the application calls `(File)URLConnection.getInputStream()`. The 
implementation of `FileURLConnection.getInputStream()` returns this previously 
constructed `InputStream` if `connect()` had previously suceeded with its 
readability checks. The application, as usual, is then expected to `close()` 
that `InputStream` that was returned from `URLConnection.getInputStream()`. 
This is the "normal" case, where the application at some point calls the 
`URLConnection.getInputStream()` and closes that stream.

As noted previously, `URLConnection` has a few other APIs, for example the APIs 
that provide header values. An example:


Path regularFile = Path.of("hello.txt");
URLConnection conn = regularFile.toUri().toURL().openConnection();
// either of the following header related APIs
long val = conn.getLastModified();
String val = conn.getHeaderField("foobar");
// ... some other header related APIs


The issue/leak arises in these cases. Application code constructs a 
`URLConnection` and invokes these trivial APIs and neither does an explicit 
`connect()` nor calls `getInputStream()`, because it doesn't have to. In code 
like this, the implementation of `FileURLConnection` internally in the 
implementation of `getLastModified()` and other similar APIs, does a 
`connect()`. As explained above, the implementation of `connect()` then creates 
and leaves around a `FileInputStream` after doing readability checks. In this 
above application code, the `InputStream` never reaches the application code 
nor does it closed, thus leading to the leak.


The specification of `URLConnection` states:

> The actual connection to the remote object is made, using the {@link 
> #connect() connect} method.
> The remote object becomes available. The header fields and the contents of 
> the remote object can be accessed.
> ...
> Invoking the {@code close()} methods on the {@code InputStream} or {@code 
> OutputStream} of an
> {@code URLConnection} after a request may free network resources associated 
> with this
> instance, unless particular protocol specifications specify different 
> behaviours
> for it.

It can be argued that it seems to expect that the application call 
`URLConnection.getInputStream()` and then close() it to close the resources, 
even in the case where it isn't interested in the `InputStream` or its 
contents. But, I think, it's a bit odd to be expecting applications to be doing 
that, if at all that's what is expected. So the change in this PR proposes to 
address the leak that can arise when the application never calls 
`URLConnection.getInputStream()`.

The change in this PR, keeps the readability checks in the `connect()` method. 
However, the `FileInputStream` that is opened to do the readability check is 
closed before returning from the `connect()` method. That way, there is no 
`InputStream` lying around. When/if the application calls 
`URLConnection.getInputStream()` then the implementation in that method has 
been updated to construct and return the `FileInputStream`. This does mean that 
there's a change in behaviour for the following case (I consider it a corner 
case):

- URLConnection is constructed for a "foo.txt" file which is readable
- URLConnection.getLastModified() (or similar header related APIs) is invoked 
on that URLConnection. Internally that results in a connect() and file 
readability check and that succeeds.
- Behind the scenes on the filesystem (or from some other part of the 
application), "foo.txt" file's permissions are modified to remove "read" 
permission.
- URLConnection.getInputStream() is invoked on that previously constructed (and 
connected) URLConnection.

Before the changes in this PR, the above scenario would result in the 
`URLConnection.getInputStream()` returning normally a `InputStream` instance 
(that was cached during the connect() call). The application would then be able 
to normally/successfully read content from that `InputStream`. With the 
proposed change in this PR, the `URLConnection.getInputStream()` call will 
throw a `FileNotFoundException`, when it attempts to construct the 
`FileInputStream` and noticing that the file doesn't have "read" permission. 
`URLConnection.getInputStream()` is specified to throw an `IOException`, so the 
exception (and its type) itself isn't a problem. It however is still a change 
in behaviour. Having said that, my opinion is that changing file permissions in 
between API calls on a `URLConnection` is a corner case and I think it might be 
OK if there is a change in behaviour here (along with a release note maybe), 
especially since I think the proposed change prevents a leak in a "normal" 
usage of t
 he `URLConnection` APIs in the application code (and also helps the internal 
implementation in `FileURLConnection` to have a more clearly demarcated 
lifetime for the `InputStream` instance).

Additional care has been taken to make sure that no other change in behaviour 
is introduced. 2 new jtreg tests have been introduced. The 
`FileURLConnStreamLeakTest` is a regression test which reproduces the leak and 
verifies the fix - several test methods in this test will fail on Windows 
without the fix and will pass with the fix. The `GetInputStreamTest` is a new 
test which has been introduced to help improve the coverage of testing for the 
`InputStream` returned from the `FileURLConnection`.

tier1, tier2 and tier3 tests pass with this change.

-------------

Commit messages:
 - 8367561: fix inputstream leak
 - 8367561: introduce tests

Changes: https://git.openjdk.org/jdk/pull/27633/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27633&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8367561
  Stats: 515 lines in 4 files changed: 483 ins; 11 del; 21 mod
  Patch: https://git.openjdk.org/jdk/pull/27633.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27633/head:pull/27633

PR: https://git.openjdk.org/jdk/pull/27633

Reply via email to