Hi, I posted about this earlier (few weeks ago), and got some responses about concerns which I addressed in my last email, though I didn't hear back about it. My apologies if I shouldn't be sending this; I'm not sure what the protocol is about this stuff
Classloading on a standard Java application with jars on the classpath currently does a linear search through every jar on the classpath, and every entry in a jar, for every class loaded. As URLClassPath searches for an entry/resource/class, it's possible to cache each entry it encounters -> where to find it, so in the future if a resource has already been seen we don't need to repeat the ~2d search Original thread (august list): http://mail.openjdk.java.net/pipermail/core-libs-dev/2015-August/035009.html Last message (september list): http://mail.openjdk.java.net/pipermail/core-libs-dev/2015-September/035016.html I got 3 responses: 2 concerning changes to jars at runtime (invalidating cache), and 1 saying you're not supposed to modify jars at runtime (can confirm via source code, and manual testing - it crashes the JVM) In my last message I addressed - jars being modified (which you're not supposed to do; the current classloader does not handle this) or the classpath changing (only possible if you make public fields/methods via reflection, and this could easily be handled gracefully) - some details of the finding resource process (e.g. the meta index, when the cache for jar entries can't be used because of the semantics of other loaders/types of entries on the classpath) - a reference implementation of caching that I believe is simple and compliant with existing functionality - some basic numbers on performance --- So in this email I wanted to explain the problem again, hopefully more clearly now URLClassPath is used by URLClassLoader to find classes, though it could be used for finding any resource on a classpath. URLClassPath keeps an array of URLs, which are typically folders or jars on the local filesystem. They can be http or ftp or other files, but that's not relevant/doesn't affect this problem To find a resource/class (URLClassPath#getResource), it: 1. Loops through the URLs in order 2. Creates Loader objects for each URL lazily (URLClassPath$FileLoader or URLClassPath$JarLoader). So if the Loader for the first URL finds all the resources, Loaders for the remaining entries on the classpath are never created/looked at 3. Calls Loader#getResource and returns the resource if found (otherwise keep searching) URLClassPath$JarLoader creates its corresponding JarFile either in the constructor or in getResource (depending on the meta index - the details I explained in my last email I won't repeat) When a JarFile is created, it opens the jar file on the file system, reads the central directory of the jar/zip file, and creates an internal linked list with all its entries JarFile objects are immutable; you can only open them for read/delete (see constructor API http://docs.oracle.com/javase/7/docs/api/java/util/jar/JarFile.html#JarFile(java.io.File,%20boolean,%20int) ), they do not detect if the file has been modified externally, and you only "append" or "delete" entries by creating a new jar (JarOutputStream) When URLClassPath$JarLoader#getResource calls JarFile#getEntry; in the C code it searches through the linked list (jdk/src/share/native/java/util/zip/zip_util.c, ZIP_GetEntry - jar files are just zip files, and the java JarFile object just extends ZipFile) Since the order in which jar files and jar entries are searched is invariant, we can create a map of resource -> first jar which contains it However, we don't want to introduce additional overhead. When a JarFile is created, it already builds the internal linked list - it's O(number of entries) I propose that after the JarLoader creates the JarFile, it iterates through its entries and adds them to the cache (if the map does not already contain the resource, i.e. an earlier jar contains the resource) This adds a small overhead when instantiating loaders - but creating the JarLoader/JarFile is still technically O(number of entries), and now getResource is constant time, instead of requiring a linear search through every jar and the linked list of entries for each jar (O(number of jars * entries/jar)) There are several caveats when the cache cannot be used with non-jar URLs on the classpath, and the meta index, but I explain them in my last email along with comments in the reference implementation --- Regarding modified jars: - moved/renamed: the file handle is still valid and it doesn't affect the JVM/classloading - deleted: the file handle is still valid and it doesn't affect the JVM/classloading - modified: the JVM crashes The first two may not be intuitive, but remember that file handles point to files; not paths on the filesystem. So even though a jar appears renamed in the shell, the java process has opened a file, somewhere in the c implementation of file objects it has the file handle, and when the JVM does the system call read on the file handle say to read the class from the jar file, it all works fine For what it's worth, here's a stack overflow answer as "source": http://stackoverflow.com/questions/2028874/what-happens-to-an-open-file-handler-on-linux-if-the-pointed-file-gets-moved-de --- There is a protected method URLClassLoader#addURL which appends a URL to the classpath. People could use reflection to make it public. Because jars are opened lazily and the cache is also built lazily whenever a jar is opened, it doesn't matter if paths are appended Regarding people making extensive use of reflection to modify the order of entries on the classpath, I believe that's irrelevant as that's clearly not the semantic of URLClassLoader/URLClassPath. People who need custom classloading rules create custom classloaders; that's the purpose of classloaders being extensible --- Anyways, I hope this was discussion worthy. I've looked much into this and believe I haven't missed anything, but if someone knows why it hasn't/can't be done any insight would be appreciated! Alan from the last email thread said "There was a lot of experiments and prototypes in the past along these lines" - are the results public? He also mentioned improving classloading in Java's upcoming module system (originally planned for Java 7, currently delayed to Java 9), but I believe the algorithmic complexity and performance of URLClassLoader could be improved without complicated changes Please let me know what you think, and thanks for your time! Best regards, Adrian
