I took a closer look at the URLClassLoader/URLClassPath/JarFile/ZipFile code, and it doesn't seem to handle the jars being modified from an external source; even though URLClassPath iterates through all the jars in order each time searching for a resource, the JarFile objects never get updated if the file is modified from an external source (and these objects are not accessible publicly)
Every JarLoader has a private JarFile, and JarLoader#getResource calls JarFile#getJarEntry, which ends up calling ZipFile#getEntry This ends up calling ZIP_GetEntry implemented in C (jdk/src/share/native/java/util/zip/zip_util.c), which iterates through a jzfile->entries, which is an array of jzcells (these are defined in zip_util.h) There's nothing that updates this structure if the file has been modified externally; it will always contain the original data when it opened the file, unless modified through the same object So as Alan said, modifying a JAR file in use leads to unpredictable behaviour I believe implementing a cache is possible without affecting the semantics of URLClassPath It would even be possible to cache the JarEntry/ZipEntry so it doesn't need to iterate through jzfile->entries for every JarFile The motivation is to 1. Remove linear checking all the JarLoaders to get a resource 2. Remove linear checking all the entries in a JarFile to see if it contains a resource I noticed a couple other complications: 1. We can cache entries in jar files, but resources may still show up in URLClassPath$Loaders and URLClassPath$FileLoaders "earlier" on the classpath, which have precedence 2. Not all JarLoaders open the JarFile in the constructor - depends on the meta index I noticed a couple other caveats though, which I'll talk about For the first problem, suppose these are the items in URLClassPath#loaders (an ArrayList<Loader>): 0. FileLoader 1. JarLoader 2. JarLoader And we call URLClassPath#getResource for the first time for "abc". It creates the FileLoader, calls FileLoader#getResource, and doesn't find "abc". It creates the JarLoader, calls JarLoader#getResource, and finds "abc". It also caches "def" and "ghi" which this jar contains, so URLClassPath has a map which points these resources to this loader. The loader keeps its own map of resources -> JarEntries. When we call URLClassPath#getResource for "def" or "ghi", we'll find entries in our cache/map, but it's important to check the first FileLoader as it comes before the JarLoader For the second problem, I found the meta index best documented in its source (online link: http://www.docjar.com/html/api/sun/misc/MetaIndex.java.html). To my understanding, the meta index provides information on which classes a jar file may contain (typically system jar files? the file is auto generated) When a meta index is present, the JarLoader doesn't open the JarFile eagerly because the meta index can say if a jar definitely doesn't contain a resource (it delays opening the jar until it has to) The workaround is to keep track of which Loaders have cached entries. There's no caching for Loader and FileLoader, and for JarLoaders where the meta index is present. With the cache, we can ignore all the JarLoaders with cached entries (i.e. your average jar files) I don't think the overhead for reading all the jar entries to build the cache is a problem because as it is JarFile already needs to do a linear search through all its entries when calling getJarEntry I implemented changes based on what I described above to test its performance/see if it would work. I'm running a basic HDFS client which reads data from a datanode. It loads ~1600 classes from ~100 jar library files. Normally, URLClassPath#getResource in total takes ~850ms. With the change, it takes ~550ms I included a patch below to show what I mean (in case the code is clearer than my description), and in case someone wants to try the changes themselves (my apologies if there's a better way to do this - I'm still new to the mailing list) Hopefully I explained my thoughts well, and I tried to explain the "why" in my comments in the code Any insight on why this would/wouldn't work or wouldn't be ideal would be greatly appreciated I'm just trying to get a better understand of how JVM (classloading) works, why it works that way, and how it could be improved Best regards, Adrian Patch to URLClassPath.java (also attached, but I'm not sure if attachments work) diff --git a/src/share/classes/sun/misc/URLClassPath.java b/src/share/classes/sun/misc/URLClassPath.java --- a/src/share/classes/sun/misc/URLClassPath.java +++ b/src/share/classes/sun/misc/URLClassPath.java @@ -86,6 +86,7 @@ /* Map of each URL opened to its corresponding Loader */ HashMap<String, Loader> lmap = new HashMap<String, Loader>(); + HashMap<String, Loader> cmap = new HashMap<String, Loader>(); /* The jar protocol handler to use when creating new URLs */ private URLStreamHandler jarHandler; @@ -195,6 +196,37 @@ } Loader loader; + Loader cachedLoader = cmap.get(name); + if (cachedLoader != null) { + // only loop through already initialized loaders + // if we have a cache entry for this resource, it means it exists in one of the already initialized JarLoaders + // we only have to check Loaders and FileLoaders that come before it + for (int i = 0; i < loaders.size(); i++) { + loader = getLoader(i); + // we checked all loaders prior to the cached loader we know contains this resource + if (loader == cachedLoader) { + break; + } + // the cache maps a resource to the first JarLoader that contains it* + // if the current loader is a JarLoader and it is not the cached hit, it doesn't contain the resource + // * based on the meta index, a JarLoader may not open a JarFile when constructed + // so some JarLoaders don't have cached entries; we still need to check them + // because they are before the cached loader + // Each JarLoader keeps track of whether it has cached its entries or not + // getAreEntriesCached will always be false for Loaders and FileLoaders + if (loader.getAreEntriesCached()) { + continue; + } + Resource res = loader.getResource(name, check); + if (res != null) { + return res; + } + } + // we tested all the prior loaders that could possibly contain the resource + // the resource was not found there, so we know our cached loader is the first loader that contains the resource + return cachedLoader.getResource(name, check); + } + for (int i = 0; (loader = getLoader(i)) != null; i++) { Resource res = loader.getResource(name, check); if (res != null) { @@ -363,7 +395,7 @@ return new Loader(url); } } else { - return new JarLoader(url, jarHandler, lmap); + return new JarLoader(url, jarHandler, lmap, cmap); } } }); @@ -471,12 +503,16 @@ private static class Loader implements Closeable { private final URL base; private JarFile jarfile; // if this points to a jar file + // whether or not this Loader has added its entries to URLClassPath's cache + // only set to true by JarLoader + protected boolean areEntriesCached; /* * Creates a new Loader for the specified URL. */ Loader(URL url) { base = url; + areEntriesCached = false; } /* @@ -583,6 +619,10 @@ URL[] getClassPath() throws IOException { return null; } + + public boolean getAreEntriesCached() { + return areEntriesCached; + } } /* @@ -595,6 +635,8 @@ private MetaIndex metaIndex; private URLStreamHandler handler; private HashMap<String, Loader> lmap; + private HashMap<String, Loader> cmap; + private HashMap<String, JarEntry> jemap; private boolean closed = false; private static final sun.misc.JavaUtilZipFileAccess zipAccess = sun.misc.SharedSecrets.getJavaUtilZipFileAccess(); @@ -604,13 +646,15 @@ * a JAR file. */ JarLoader(URL url, URLStreamHandler jarHandler, - HashMap<String, Loader> loaderMap) + HashMap<String, Loader> loaderMap, HashMap<String, Loader> classMap) throws IOException { super(new URL("jar", "", -1, url + "!/", jarHandler)); csu = url; handler = jarHandler; lmap = loaderMap; + cmap = classMap; + jemap = new HashMap<String, JarEntry>(); if (!isOptimizable(url)) { ensureOpen(); @@ -638,6 +682,23 @@ ensureOpen(); } } + + if (jar != null) { + // if the metaIndex is not null, it can be used to check if a jar definitely doesn't contain a resource + // in that case, the jar is not necessarily opened yet + Enumeration<JarEntry> entries = jar.entries(); + while (entries.hasMoreElements()) { + JarEntry je = entries.nextElement(); + String name = je.getName(); + if (!cmap.containsKey(name)) { + // URLClassPath's cmap keeps track of resource -> JarLoader + cmap.put(name, this); + // local jemap keeps track of resource -> JarEntry + jemap.put(name, je); + } + } + areEntriesCached = true; + } } @Override @@ -837,7 +898,9 @@ } catch (IOException e) { throw (InternalError) new InternalError().initCause(e); } - final JarEntry entry = jar.getJarEntry(name); + // no need to call JarFile#getJarEntry(name), + // which does a linear search of an internal data structure containing all zip file entries + final JarEntry entry = jemap.get(name); if (entry != null) return checkResource(name, check, entry); @@ -890,7 +953,7 @@ new PrivilegedExceptionAction<JarLoader>() { public JarLoader run() throws IOException { return new JarLoader(url, handler, - lmap); + lmap, cmap); } }); On Mon, Aug 31, 2015 at 6:07 AM, Alan Bateman <alan.bate...@oracle.com> wrote: > > > On 31/08/2015 04:02, Jonathan Yu wrote: >> >> Hi Adrian, >> >> It's possible for jar files to be modified while the JVM is running - is >> there some facility for detecting that an archive was modified and thus >> invalidating the cache? >> > Modifying a JAR file that is in use leads to unpredictable behavior and can > even cause crashes too (when JAR/zip files are memory mapped). That said, > the class path can extend over time, the most obvious being when JAR files > have the Class-Path attribute. It is possible to construct a cache at > run-time that is package name -> the first JAR file containing that package > but at the cost of scanning the contents of JAR files (probably on open). > There was a lot of experiments and prototypes in the past along these lines. > > -Alan.
diff --git a/src/share/classes/sun/misc/URLClassPath.java b/src/share/classes/sun/misc/URLClassPath.java --- a/src/share/classes/sun/misc/URLClassPath.java +++ b/src/share/classes/sun/misc/URLClassPath.java @@ -86,6 +86,7 @@ /* Map of each URL opened to its corresponding Loader */ HashMap<String, Loader> lmap = new HashMap<String, Loader>(); + HashMap<String, Loader> cmap = new HashMap<String, Loader>(); /* The jar protocol handler to use when creating new URLs */ private URLStreamHandler jarHandler; @@ -195,6 +196,37 @@ } Loader loader; + Loader cachedLoader = cmap.get(name); + if (cachedLoader != null) { + // only loop through already initialized loaders + // if we have a cache entry for this resource, it means it exists in one of the already initialized JarLoaders + // we only have to check Loaders and FileLoaders that come before it + for (int i = 0; i < loaders.size(); i++) { + loader = getLoader(i); + // we checked all loaders prior to the cached loader we know contains this resource + if (loader == cachedLoader) { + break; + } + // the cache maps a resource to the first JarLoader that contains it* + // if the current loader is a JarLoader and it is not the cached hit, it doesn't contain the resource + // * based on the meta index, a JarLoader may not open a JarFile when constructed + // so some JarLoaders don't have cached entries; we still need to check them + // because they are before the cached loader + // Each JarLoader keeps track of whether it has cached its entries or not + // getAreEntriesCached will always be false for Loaders and FileLoaders + if (loader.getAreEntriesCached()) { + continue; + } + Resource res = loader.getResource(name, check); + if (res != null) { + return res; + } + } + // we tested all the prior loaders that could possibly contain the resource + // the resource was not found there, so we know our cached loader is the first loader that contains the resource + return cachedLoader.getResource(name, check); + } + for (int i = 0; (loader = getLoader(i)) != null; i++) { Resource res = loader.getResource(name, check); if (res != null) { @@ -363,7 +395,7 @@ return new Loader(url); } } else { - return new JarLoader(url, jarHandler, lmap); + return new JarLoader(url, jarHandler, lmap, cmap); } } }); @@ -471,12 +503,16 @@ private static class Loader implements Closeable { private final URL base; private JarFile jarfile; // if this points to a jar file + // whether or not this Loader has added its entries to URLClassPath's cache + // only set to true by JarLoader + protected boolean areEntriesCached; /* * Creates a new Loader for the specified URL. */ Loader(URL url) { base = url; + areEntriesCached = false; } /* @@ -583,6 +619,10 @@ URL[] getClassPath() throws IOException { return null; } + + public boolean getAreEntriesCached() { + return areEntriesCached; + } } /* @@ -595,6 +635,8 @@ private MetaIndex metaIndex; private URLStreamHandler handler; private HashMap<String, Loader> lmap; + private HashMap<String, Loader> cmap; + private HashMap<String, JarEntry> jemap; private boolean closed = false; private static final sun.misc.JavaUtilZipFileAccess zipAccess = sun.misc.SharedSecrets.getJavaUtilZipFileAccess(); @@ -604,13 +646,15 @@ * a JAR file. */ JarLoader(URL url, URLStreamHandler jarHandler, - HashMap<String, Loader> loaderMap) + HashMap<String, Loader> loaderMap, HashMap<String, Loader> classMap) throws IOException { super(new URL("jar", "", -1, url + "!/", jarHandler)); csu = url; handler = jarHandler; lmap = loaderMap; + cmap = classMap; + jemap = new HashMap<String, JarEntry>(); if (!isOptimizable(url)) { ensureOpen(); @@ -638,6 +682,23 @@ ensureOpen(); } } + + if (jar != null) { + // if the metaIndex is not null, it can be used to check if a jar definitely doesn't contain a resource + // in that case, the jar is not necessarily opened yet + Enumeration<JarEntry> entries = jar.entries(); + while (entries.hasMoreElements()) { + JarEntry je = entries.nextElement(); + String name = je.getName(); + if (!cmap.containsKey(name)) { + // URLClassPath's cmap keeps track of resource -> JarLoader + cmap.put(name, this); + // local jemap keeps track of resource -> JarEntry + jemap.put(name, je); + } + } + areEntriesCached = true; + } } @Override @@ -837,7 +898,9 @@ } catch (IOException e) { throw (InternalError) new InternalError().initCause(e); } - final JarEntry entry = jar.getJarEntry(name); + // no need to call JarFile#getJarEntry(name), + // which does a linear search of an internal data structure containing all zip file entries + final JarEntry entry = jemap.get(name); if (entry != null) return checkResource(name, check, entry); @@ -890,7 +953,7 @@ new PrivilegedExceptionAction<JarLoader>() { public JarLoader run() throws IOException { return new JarLoader(url, handler, - lmap); + lmap, cmap); } });