Re: Parallel classloading, need review...

Kristian Rosenvold Fri, 09 Aug 2013 10:53:30 -0700

2013/8/9 Romain Manni-Bucau <[email protected]>:
> When i tested on tomee gain was ridiculous too so maybe not the first place
> to hack on to make maven fast ;)



> Le 9 août 2013 18:36, "Jason van Zyl" <[email protected]> a écrit :
>> And what's the net difference then before after trying to parallelize the
>> classloading? I'll read up on the Java7 classloading this weekend.

I think this really depends on how we're able to exploit it. Our
domain is partitioned into lots of small classloaders, so there should
be a bit of potential. How did you try to partition your classloading
in tomee ? From what I've seen of "asm" performance, class loading is
mostly IO.

Within a single classloader I think you'd need some kind of
preemptive/recording based strategy. Implementing that in the
classRealm class in classworlds should be almost trivial, and unless
someone beats me too it, I'll do that over a few glasses of red wine
some time. (Record class loading order from one invocation and re-use
in another).

Parallel construction of multiple classloaders should have some potential

As for "making maven fast", well that's a topic I've spent
considerable time & energy on.

Apart from class loading, pom loading, pom merging and artifact
resolutions are basically the computationally intensive parts of the
maven core. Class loading and artifact resolution are the big ones;
the atctual XML parsing/merging is really not that much.

Most of the inefficiencies are in plugins. And sometimes there's
inefficiencies related to layering. An example of this is
maven-install-plugin; it uses maven core to install (copy) the jar
file into the local repository, but then it re-reads the file to
calculate SHA1/MD5 checksums. Until recently it atually read the files
3 times, I just reduced that to 2 times.

I have been profiling the heck out of a bunch of builds, and the big
stuff is in the plugins. For maven core I think it is safe to say
you'll need to look for algorithmic improvements to gain anything
significant; stuff like requesting a bunch of artifacts from the
remote repository in one HTTP request comes to mind. One could work on
parallelizing classloading, which should be doable. Other than that
there's not much left.

As a theory for my really long runs in the woods, I consider
parallelizing the entire pom loading, interpolation and artifact
resolution process. Unfortunately the massive amount of mutable state
within the maven model and the maven core makes this infeasible.
Simply put; the availablity of setters all over the place allows the
construction of models/data to decay to spaghetti. Such spaghetti also
creates wasted computation, since the same values are recalculated
repeatedly. It also hinders parallelization. Maven core has its share
of such spaghetti. On my last long run in the woods I contemplated
writing another totally immutable layer of objects beneath the current
objects and simply transfer all the state to the current model objects
when done. But we're looking at quite a tremendous effort to catch
that last second of wasted computation - better spend that energy
optimizing plugins :)

On the non-radical front, parallel classloading is probably the last
"simple" thing that can be optimized in core.

For multi-module builds there's the potential of re-using state/data
computed in one module for the next. Surefire could conceivably keep
the forked process alive between modules if the classpath is only
expanded in the next run. Or surefire could run an additional
invocation early in the lifecycle and start the forked VM while the
compiler plugin is running (if it forks, which it can decide early);
although the actual .class files may not be available, it knows
everything it needs to know.


Kristian

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Parallel classloading, need review...

Reply via email to