Re: Parallel ClassLoading space optimizations

David Holmes Mon, 04 Feb 2013 03:59:16 -0800

On 4/02/2013 9:11 PM, David Holmes wrote:

On 4/02/2013 6:42 PM, Peter Levart wrote:

I might have something usable, but I just wanted to verify some things
beforehand. What I investigated was a simple keeping a cache of locks in
a map weekly referenced. In your blog:


*
https://blogs.oracle.com/dholmes/entry/parallel_classloading_revisited_fully_concurrent


...you describe this a a 3rd alternative:

/3. Reduce the lifetime of lock objects so that entries are removed
from the map when no longer needed (eg remove after loading,*use
weak references* to the lock objects and cleanup the map periodically)./


...but later you preclude this option:

/Similarly we might reason that we can remove a mapping (and the
lock object) because the class is already loaded, but this would
again violate the specification because it can be reasoned that the
following assertion should hold true: /

|
Object lock1 = loader.getClassLoadingLock(name);
loader.loadClass(name);
Object lock2 = loader.getClassLoadingLock(name);
assert lock1 == lock2;
|

/Without modifying the specification, or at least doing some
creative wordsmithing on it, options 1 and 3 are precluded. /


When using WeakReferences to cache lock Objects, the above assertion
would still hold true, wouldn't it?


No. The WeakReference can be cleared causing lock2 to be a different
object.

As Peter has pointed out to me this is of course not correct. The onlyway you can write the assert is to have maintained strong references tothe lock objects, hence the WeakReference can not be cleared.

I need to re-consider this approach as my previous thoughts on it wereflawed.


David
-----

I can not think of any reasonable 3rd party ClassLoader code that would
behave differently when having lock objects strongly referenced for the
entire VM lifetime vs. having them temporarily weakly referenced and
eventually recreated if needed. For example, only code that does the
following things can see difference:

* use .toString or .hashCode on lock object and keep it somewhere
without also keeping the lock object itself to use it later
* wrap a lock object into a WeakReference and observe reference being
cleared or not


But that is the point - we don't know what any actual external
classloader does with the classloading lock, so we can't just
arbitrarily change the existing specification without being very sure
about the implications for making that change.

Such a change would have to have been proposed well before now so there
was time to evaluate the impact.

David
-----

Is that a reasonable assumption to continue in this direction? If the
semantics are reasonably OK, then all the solution has to prove is
(space and time) performance, right?

Here's some preliminary illustration what can be achieved space-wise.
This is a test that attempts to load all the classes from the rt.jar.
The situation we have now (using -Xms256m -Xmx256m and 32bit addresses):

...At the beginning of main()

Total memory: 257294336 bytes
Free memory: 251920320 bytes
Deep size of sun.misc.Launcher$ExtClassLoader@3d4eac69: 7936 bytes
Deep size of sun.misc.Launcher$AppClassLoader@55f96302: 30848 bytes
Deep size of both: 38784 bytes (reference)

...Attempted to load: 18558 classes in: 1964.55825 ms

Total memory: 257294336 bytes
Free memory: 227314112 bytes
Deep size of sun.misc.Launcher$ExtClassLoader@3d4eac69: 1162184 bytes
Deep size of sun.misc.Launcher$AppClassLoader@55f96302: 2215216 bytes
Deep size of both: 3377400 bytes (difference to reference: 3338616 bytes)

...Performing gc()

...Loading class: test.TestClassLoader$Last (to trigger expunging)

Total memory: 260440064 bytes
Free memory: 193163368 bytes
Deep size of sun.misc.Launcher$ExtClassLoader@3d4eac69: 1162328 bytes
Deep size of sun.misc.Launcher$AppClassLoader@55f96302: 2215408 bytes
Deep size of both: 3377736 bytes (difference to reference: 3338952 bytes)


vs. having lock objects weekly referenced and doing expunging work at
each request for a lock:

...At the beginning of main()

Total memory: 257294336 bytes
Free memory: 251920320 bytes
Deep size of sun.misc.Launcher$ExtClassLoader@75b84c92: 9584 bytes
Deep size of sun.misc.Launcher$AppClassLoader@42a57993: 33960 bytes
Deep size of both: 43544 bytes (reference)
Lock stats...
create: 108
return old: 0
replace: 0
expunge: 0

...Attempted to load: 18558 classes in: 2005.14628 ms

Total memory: 257294336 bytes
Free memory: 187198776 bytes
Deep size of sun.misc.Launcher$ExtClassLoader@75b84c92: 572768 bytes
Deep size of sun.misc.Launcher$AppClassLoader@42a57993: 1122976 bytes
Deep size of both: 1695744 bytes (difference to reference: 1652200 bytes)
Lock stats...
create: 37302
return old: 201
replace: 0
expunge: 25893

...Performing gc()

...Loading class: test.TestClassLoader$Last (to trigger expunging)

Total memory: 257294336 bytes
Free memory: 238693336 bytes
Deep size of sun.misc.Launcher$ExtClassLoader@75b84c92: 78944 bytes
Deep size of sun.misc.Launcher$AppClassLoader@42a57993: 168512 bytes
Deep size of both: 247456 bytes (difference to reference: 203912 bytes)
Lock stats...
create: 2
return old: 0
replace: 0
expunge: 11517


... as can be seen from this particular usecase, there's approx. 20%
overhead of storage for locks because of WeakReference indirection (at
the beginning of main() before any expunging kicks-in) and it seems
there's a negligible overhead of about 2% in performance when
considering total time of loading classes. After that we see that (since
this is a single-threaded example) re-use of lock for a class that is
already (being) loaded is rare (I assume only explicit requests like
Class.forName trigger that event in this example). At the end, almost
all locks are eventually released, which frees 3MB+ heap space.

Here's a piece of code for obtaining locks (coded as a subclass of
ConcurrentHashMap for performance reasons):

public Object getOrCreate(K key) {
// the most common situation is that the key is new, so
optimize fast-path accordingly
Object lock = new Object();
LockRef<K> ref = new LockRef<>(key, lock, refQueue);
expungeStaleEntries();
for (; ; ) {
@SuppressWarnings("unchecked")
LockRef<K> oldRef = (LockRef<K>) super.putIfAbsent(key, ref);
if (oldRef == null) {
if (keepStats) createCount.increment();
return lock;
}
else {
Object oldLock = oldRef.get();
if (oldLock != null) {
if (keepStats) returnOldCount.increment();
return oldLock;
}
else if (super.replace(key, oldRef, ref)) {
if (keepStats) replaceCount.increment();
return lock;
}
}
}
}

private void expungeStaleEntries() {
LockRef<K> ref;
while ((ref = (LockRef<K>) refQueue.poll()) != null) {
super.remove(ref.key, ref);
if (keepStats) expungeCount.increment();
}
}


Do you think this is something worth pursuing further?


Regards, Peter


On 02/01/2013 05:01 AM, David Holmes wrote:

Hi Peter,

On 31/01/2013 11:07 PM, Peter Levart wrote:

Hi David,

Could the parallel classloading be at least space optimized somehow in
the JDK8 timeframe if there was a solution ready?


If there is something that does not impact any of the existing
specified semantics regarding the classloader lock object then it may
be possible to work it into an 8 update if not 8 itself. But all the
suggestions I've seen for reducing the memory usage also alter the
semantics in someway.

However, a key part of the concurrent classloader proposal was that it
didn't change the behaviour of any existing classloaders outside the
core JDK. Anything that changes existing behaviour has a much higher
compatibility bar to get over.

David
-----

Re: Parallel ClassLoading space optimizations

Reply via email to