Re: Cache which java classes are in a jar when opening jar the first time during classloading

Adrian Tue, 01 Sep 2015 01:02:28 -0700

I took a closer look at the
URLClassLoader/URLClassPath/JarFile/ZipFile code, and it doesn't seem
to handle the jars being modified from an external source; even though
URLClassPath iterates through all the jars in order each time
searching for a resource, the JarFile objects never get updated if the
file is modified from an external source (and these objects are not
accessible publicly)


Every JarLoader has a private JarFile, and JarLoader#getResource calls
JarFile#getJarEntry, which ends up calling ZipFile#getEntry
This ends up calling ZIP_GetEntry implemented in C
(jdk/src/share/native/java/util/zip/zip_util.c), which iterates
through a jzfile->entries, which is an array of jzcells (these are
defined in zip_util.h)
There's nothing that updates this structure if the file has been
modified externally; it will always contain the original data when it
opened the file, unless modified through the same object

So as Alan said, modifying a JAR file in use leads to unpredictable behaviour
I believe implementing a cache is possible without affecting the
semantics of URLClassPath
It would even be possible to cache the JarEntry/ZipEntry so it doesn't
need to iterate through jzfile->entries for every JarFile
The motivation is to
1. Remove linear checking all the JarLoaders to get a resource
2. Remove linear checking all the entries in a JarFile to see if it
contains a resource

I noticed a couple other complications:
1. We can cache entries in jar files, but resources may still show up
in URLClassPath$Loaders and URLClassPath$FileLoaders "earlier" on the
classpath, which have precedence
2. Not all JarLoaders open the JarFile in the constructor - depends on
the meta index
I noticed a couple other caveats though, which I'll talk about

For the first problem, suppose these are the items in
URLClassPath#loaders (an ArrayList<Loader>):
0. FileLoader
1. JarLoader
2. JarLoader

And we call URLClassPath#getResource for the first time for "abc".
It creates the FileLoader, calls FileLoader#getResource, and doesn't find "abc".
It creates the JarLoader, calls JarLoader#getResource, and finds "abc".
It also caches "def" and "ghi" which this jar contains, so
URLClassPath has a map which points these resources to this loader.
The loader keeps its own map of resources -> JarEntries.
When we call URLClassPath#getResource for "def" or "ghi", we'll find
entries in our cache/map, but it's important to check the first
FileLoader as it comes before the JarLoader

For the second problem, I found the meta index best documented in its
source  (online link:
http://www.docjar.com/html/api/sun/misc/MetaIndex.java.html).
To my understanding, the meta index provides information on which
classes a jar file may contain (typically system jar files? the file
is auto generated)
When a meta index is present, the JarLoader doesn't open the JarFile
eagerly because the meta index can say if a jar definitely doesn't
contain a resource (it delays opening the jar until it has to)

The workaround is to keep track of which Loaders have cached entries.
There's no caching for Loader and FileLoader, and for JarLoaders where
the meta index is present.
With the cache, we can ignore all the JarLoaders with cached entries
(i.e. your average jar files)

I don't think the overhead for reading all the jar entries to build
the cache is a problem because as it is JarFile already needs to do a
linear search through all its entries when calling getJarEntry

I implemented changes based on what I described above to test its
performance/see if it would work.
I'm running a basic HDFS client which reads data from a datanode.
It loads ~1600 classes from ~100 jar library files.
Normally, URLClassPath#getResource in total takes ~850ms.
With the change, it takes ~550ms

I included a patch below to show what I mean (in case the code is
clearer than my description), and in case someone wants to try the
changes themselves (my apologies if there's a better way to do this -
I'm still new to the mailing list)

Hopefully I explained my thoughts well, and I tried to explain the
"why" in my comments in the code

Any insight on why this would/wouldn't work or wouldn't be ideal would
be greatly appreciated
I'm just trying to get a better understand of how JVM (classloading)
works, why it works that way, and how it could be improved

Best regards,
Adrian

Patch to URLClassPath.java (also attached, but I'm not sure if attachments work)
diff --git a/src/share/classes/sun/misc/URLClassPath.java
b/src/share/classes/sun/misc/URLClassPath.java
--- a/src/share/classes/sun/misc/URLClassPath.java
+++ b/src/share/classes/sun/misc/URLClassPath.java
@@ -86,6 +86,7 @@

     /* Map of each URL opened to its corresponding Loader */
     HashMap<String, Loader> lmap = new HashMap<String, Loader>();
+    HashMap<String, Loader> cmap = new HashMap<String, Loader>();

     /* The jar protocol handler to use when creating new URLs */
     private URLStreamHandler jarHandler;
@@ -195,6 +196,37 @@
         }

         Loader loader;
+        Loader cachedLoader = cmap.get(name);
+        if (cachedLoader != null) {
+            // only loop through already initialized loaders
+            // if we have a cache entry for this resource, it means
it exists in one of the already initialized JarLoaders
+            // we only have to check Loaders and FileLoaders that
come before it
+            for (int i = 0; i < loaders.size(); i++) {
+                loader = getLoader(i);
+                // we checked all loaders prior to the cached loader
we know contains this resource
+                if (loader == cachedLoader) {
+                    break;
+                }
+                // the cache maps a resource to the first JarLoader
that contains it*
+                // if the current loader is a JarLoader and it is not
the cached hit, it doesn't contain the resource
+                // * based on the meta index, a JarLoader may not
open a JarFile when constructed
+                //   so some JarLoaders don't have cached entries; we
still need to check them
+                //   because they are before the cached loader
+                //   Each JarLoader keeps track of whether it has
cached its entries or not
+                //   getAreEntriesCached will always be false for
Loaders and FileLoaders
+                if (loader.getAreEntriesCached()) {
+                    continue;
+                }
+                Resource res = loader.getResource(name, check);
+                if (res != null) {
+                    return res;
+                }
+            }
+            // we tested all the prior loaders that could possibly
contain the resource
+            // the resource was not found there, so we know our
cached loader is the first loader that contains the resource
+            return cachedLoader.getResource(name, check);
+        }
+
         for (int i = 0; (loader = getLoader(i)) != null; i++) {
             Resource res = loader.getResource(name, check);
             if (res != null) {
@@ -363,7 +395,7 @@
                             return new Loader(url);
                         }
                     } else {
-                        return new JarLoader(url, jarHandler, lmap);
+                        return new JarLoader(url, jarHandler, lmap, cmap);
                     }
                 }
             });
@@ -471,12 +503,16 @@
     private static class Loader implements Closeable {
         private final URL base;
         private JarFile jarfile; // if this points to a jar file
+        // whether or not this Loader has added its entries to
URLClassPath's cache
+        // only set to true by JarLoader
+        protected boolean areEntriesCached;

         /*
          * Creates a new Loader for the specified URL.
          */
         Loader(URL url) {
             base = url;
+            areEntriesCached = false;
         }

         /*
@@ -583,6 +619,10 @@
         URL[] getClassPath() throws IOException {
             return null;
         }
+
+        public boolean getAreEntriesCached() {
+            return areEntriesCached;
+        }
     }

     /*
@@ -595,6 +635,8 @@
         private MetaIndex metaIndex;
         private URLStreamHandler handler;
         private HashMap<String, Loader> lmap;
+        private HashMap<String, Loader> cmap;
+        private HashMap<String, JarEntry> jemap;
         private boolean closed = false;
         private static final sun.misc.JavaUtilZipFileAccess zipAccess =
                 sun.misc.SharedSecrets.getJavaUtilZipFileAccess();
@@ -604,13 +646,15 @@
          * a JAR file.
          */
         JarLoader(URL url, URLStreamHandler jarHandler,
-                  HashMap<String, Loader> loaderMap)
+                  HashMap<String, Loader> loaderMap, HashMap<String,
Loader> classMap)
             throws IOException
         {
             super(new URL("jar", "", -1, url + "!/", jarHandler));
             csu = url;
             handler = jarHandler;
             lmap = loaderMap;
+            cmap = classMap;
+            jemap = new HashMap<String, JarEntry>();

             if (!isOptimizable(url)) {
                 ensureOpen();
@@ -638,6 +682,23 @@
                     ensureOpen();
                 }
             }
+
+            if (jar != null) {
+                // if the metaIndex is not null, it can be used to
check if a jar definitely doesn't contain a resource
+                // in that case, the jar is not necessarily opened yet
+                Enumeration<JarEntry> entries = jar.entries();
+                while (entries.hasMoreElements()) {
+                    JarEntry je = entries.nextElement();
+                    String name = je.getName();
+                    if (!cmap.containsKey(name)) {
+                        // URLClassPath's cmap keeps track of
resource -> JarLoader
+                        cmap.put(name, this);
+                        // local jemap keeps track of resource -> JarEntry
+                        jemap.put(name, je);
+                    }
+                }
+                areEntriesCached = true;
+            }
         }

         @Override
@@ -837,7 +898,9 @@
             } catch (IOException e) {
                 throw (InternalError) new InternalError().initCause(e);
             }
-            final JarEntry entry = jar.getJarEntry(name);
+            // no need to call JarFile#getJarEntry(name),
+            // which does a linear search of an internal data
structure containing all zip file entries
+            final JarEntry entry = jemap.get(name);
             if (entry != null)
                 return checkResource(name, check, entry);

@@ -890,7 +953,7 @@
                                 new PrivilegedExceptionAction<JarLoader>() {
                                     public JarLoader run() throws IOException {
                                         return new JarLoader(url, handler,
-                                            lmap);
+                                            lmap, cmap);
                                     }
                                 });

On Mon, Aug 31, 2015 at 6:07 AM, Alan Bateman <alan.bate...@oracle.com> wrote:
>
>
> On 31/08/2015 04:02, Jonathan Yu wrote:
>>
>> Hi Adrian,
>>
>> It's possible for jar files to be modified while the JVM is running - is
>> there some facility for detecting that an archive was modified and thus
>> invalidating the cache?
>>
> Modifying a JAR file that is in use leads to unpredictable behavior and can
> even cause crashes too (when JAR/zip files are memory mapped). That said,
> the class path can extend over time, the most obvious being when JAR files
> have the Class-Path attribute. It is possible to construct a cache at
> run-time that is package name -> the first JAR file containing that package
> but at the cost of scanning the contents of JAR files (probably on open).
> There was a lot of experiments and prototypes in the past along these lines.
>
> -Alan.

diff --git a/src/share/classes/sun/misc/URLClassPath.java 
b/src/share/classes/sun/misc/URLClassPath.java
--- a/src/share/classes/sun/misc/URLClassPath.java
+++ b/src/share/classes/sun/misc/URLClassPath.java
@@ -86,6 +86,7 @@
 
     /* Map of each URL opened to its corresponding Loader */
     HashMap<String, Loader> lmap = new HashMap<String, Loader>();
+    HashMap<String, Loader> cmap = new HashMap<String, Loader>();
 
     /* The jar protocol handler to use when creating new URLs */
     private URLStreamHandler jarHandler;
@@ -195,6 +196,37 @@
         }
 
         Loader loader;
+        Loader cachedLoader = cmap.get(name);
+        if (cachedLoader != null) {
+            // only loop through already initialized loaders
+            // if we have a cache entry for this resource, it means it exists 
in one of the already initialized JarLoaders
+            // we only have to check Loaders and FileLoaders that come before 
it
+            for (int i = 0; i < loaders.size(); i++) {
+                loader = getLoader(i);
+                // we checked all loaders prior to the cached loader we know 
contains this resource
+                if (loader == cachedLoader) {
+                    break;
+                }
+                // the cache maps a resource to the first JarLoader that 
contains it*
+                // if the current loader is a JarLoader and it is not the 
cached hit, it doesn't contain the resource
+                // * based on the meta index, a JarLoader may not open a 
JarFile when constructed
+                //   so some JarLoaders don't have cached entries; we still 
need to check them
+                //   because they are before the cached loader
+                //   Each JarLoader keeps track of whether it has cached its 
entries or not
+                //   getAreEntriesCached will always be false for Loaders and 
FileLoaders
+                if (loader.getAreEntriesCached()) {
+                    continue;
+                }
+                Resource res = loader.getResource(name, check);
+                if (res != null) {
+                    return res;
+                }
+            }
+            // we tested all the prior loaders that could possibly contain the 
resource
+            // the resource was not found there, so we know our cached loader 
is the first loader that contains the resource
+            return cachedLoader.getResource(name, check);
+        }
+
         for (int i = 0; (loader = getLoader(i)) != null; i++) {
             Resource res = loader.getResource(name, check);
             if (res != null) {
@@ -363,7 +395,7 @@
                             return new Loader(url);
                         }
                     } else {
-                        return new JarLoader(url, jarHandler, lmap);
+                        return new JarLoader(url, jarHandler, lmap, cmap);
                     }
                 }
             });
@@ -471,12 +503,16 @@
     private static class Loader implements Closeable {
         private final URL base;
         private JarFile jarfile; // if this points to a jar file
+        // whether or not this Loader has added its entries to URLClassPath's 
cache
+        // only set to true by JarLoader
+        protected boolean areEntriesCached;
 
         /*
          * Creates a new Loader for the specified URL.
          */
         Loader(URL url) {
             base = url;
+            areEntriesCached = false;
         }
 
         /*
@@ -583,6 +619,10 @@
         URL[] getClassPath() throws IOException {
             return null;
         }
+
+        public boolean getAreEntriesCached() {
+            return areEntriesCached;
+        }
     }
 
     /*
@@ -595,6 +635,8 @@
         private MetaIndex metaIndex;
         private URLStreamHandler handler;
         private HashMap<String, Loader> lmap;
+        private HashMap<String, Loader> cmap;
+        private HashMap<String, JarEntry> jemap;
         private boolean closed = false;
         private static final sun.misc.JavaUtilZipFileAccess zipAccess =
                 sun.misc.SharedSecrets.getJavaUtilZipFileAccess();
@@ -604,13 +646,15 @@
          * a JAR file.
          */
         JarLoader(URL url, URLStreamHandler jarHandler,
-                  HashMap<String, Loader> loaderMap)
+                  HashMap<String, Loader> loaderMap, HashMap<String, Loader> 
classMap)
             throws IOException
         {
             super(new URL("jar", "", -1, url + "!/", jarHandler));
             csu = url;
             handler = jarHandler;
             lmap = loaderMap;
+            cmap = classMap;
+            jemap = new HashMap<String, JarEntry>();
 
             if (!isOptimizable(url)) {
                 ensureOpen();
@@ -638,6 +682,23 @@
                     ensureOpen();
                 }
             }
+
+            if (jar != null) {
+                // if the metaIndex is not null, it can be used to check if a 
jar definitely doesn't contain a resource
+                // in that case, the jar is not necessarily opened yet
+                Enumeration<JarEntry> entries = jar.entries();
+                while (entries.hasMoreElements()) {
+                    JarEntry je = entries.nextElement();
+                    String name = je.getName();
+                    if (!cmap.containsKey(name)) {
+                        // URLClassPath's cmap keeps track of resource -> 
JarLoader
+                        cmap.put(name, this);
+                        // local jemap keeps track of resource -> JarEntry
+                        jemap.put(name, je);
+                    }
+                }
+                areEntriesCached = true;
+            }
         }
 
         @Override
@@ -837,7 +898,9 @@
             } catch (IOException e) {
                 throw (InternalError) new InternalError().initCause(e);
             }
-            final JarEntry entry = jar.getJarEntry(name);
+            // no need to call JarFile#getJarEntry(name),
+            // which does a linear search of an internal data structure 
containing all zip file entries
+            final JarEntry entry = jemap.get(name);
             if (entry != null)
                 return checkResource(name, check, entry);
 
@@ -890,7 +953,7 @@
                                 new PrivilegedExceptionAction<JarLoader>() {
                                     public JarLoader run() throws IOException {
                                         return new JarLoader(url, handler,
-                                            lmap);
+                                            lmap, cmap);
                                     }
                                 });

Re: Cache which java classes are in a jar when opening jar the first time during classloading

Reply via email to