Re: Cache which java classes are in a jar when opening jar the first time during classloading

David Holmes Sun, 30 Aug 2015 23:33:07 -0700

On 31/08/2015 4:01 PM, Adrian wrote:

Hi Jonathan,


I'm not aware of any specific facilities for detecting that a jar file
was modified
I believe as it is it doesn't update its internal structures (the
jzfile and jzcells and jzentries) if it was modified from an outside
source

If I am wrong about that (I don't see any code in class loading/jar
files related to updating itself when modified from outside, but
assuming it does do that), a class -> jar map could still be used to
figure out what jar to start at - something like:

Loader l = cmap.get(name);
if (l != null) {
     Resource res = l.getResource(name);
     if (res != null) {
         return res;
     } else {
         // invalidate cache
         cmap.remove(name);
     }
}

// existing code
for (int i = 0; (loader = getLoader(i)) != null; i++) {

So it will "lazily" invalidate the cache when it can't actually find
the resource in a jar

If a resource was added to a jar closer to the beginning of the
classpath, you would want to load the file from the earlier jar
Of course I don't know for sure, but I can't really imagine use cases
of doing that

Such behaviour is not prohibited though so any caching mechanism wouldhave to allow for this - and I don't see how it could effectively do so.

If I understand correctly, classloaders are intended to be extensible
so that people can write their custom classloaders for tricky use
cases - in which case they can implement a classloader whose
getResource doesn't use a cache

I think this would have to be the other way around. If your classloaderconstrains dynamic changes to how it locates classes then it canimplement a cache.

I believe the upcoming module system would address this issue to someextent.


David
-----

However, I think the majority of java applications with long
classpaths would benefit from this - for example, I believe HDFS has
~100 jars on its classpath

Thanks for your reply!
Please let me know what you think of this

Best regards,
Adrian



On Sun, Aug 30, 2015 at 11:02 PM, Jonathan Yu <jaw...@cpan.org> wrote:

Hi Adrian,

It's possible for jar files to be modified while the JVM is running - is
there some facility for detecting that an archive was modified and thus
invalidating the cache?

Also, I wonder how class data sharing might interact with this, though I'll
admit that I don't know much about HotSpot (I use the IBM JVM).


On Sun, Aug 30, 2015, 18:20 Adrian <withoutpoi...@gmail.com> wrote:


Hello,

I have been looking through the JVM source related to class loading.
URLClassLoader#findClass calls URLClassPath#getResource
URLClassPath creates a "loader" for every entry on the classpath (e.g.
one JarLoader per jar file)

In getResource, it loops through all its loaders in order,
instantiating them lazily.
For example, it will only create a JarLoader and open a jar file
somewhere "farther along" the classpath if it did not find the
resource in all the prior jars

URLClassLoader#findClass and URLClassPath#getResource are doing linear
searches on all the entries on the classpath every time they need to
load a resource

For a jar file, if there is an index in META-INF, at least the
corresponding loader can figure out if the jar contains a class right
away.
If not, it searches an internal array/data structure created from the
zipfile central directory (see
jdk/src/share/native/java/util/zip/zip_util.c ZIP_GetEntry - if you
follow the call hiearchy from URLClassPath$JarLoader#getResource, you
end up at this function)

If the jars on the classpath are optimal (majority of the classes are
in the first few jars), there is not much overhead
However, when classes are located in multiple jars along the
classpath, the JVM spends nontrivial time searching through all of
them

One possible "solution" would be create a map of all resources ->
which jar/jar loader they belong in whenever a jar file is opened.
This can be done by iterating over JarFile#entries(), which just reads
the central directory from the jar/zip file (which is done anyways to
create some additional data structures when opening a jar/zip file)

I implemented this to try it out and for a java program with ~1800
classes, it improved the find class time (taken from
sun.misc.PerfCounter.getFindClassTime()) from ~1.4s to ~1s

I tried to think of reasons why this was not done already; looking
through the code, I believe the semantics of the loaders remain the
same.
There is technically a memory overhead of saving this map of resources
-> jar files/loaders, but improves the algorithm complexity from
O(number of jars on classpath) to O(1)

Would appreciate any feedback/insight as to whether this would be a
good change or why it is the way it currently is.
Thank you!

Best regards,
Adrian

Re: Cache which java classes are in a jar when opening jar the first time during classloading

Reply via email to