Re: MNG-3004/MNG-2802 - Achieving massive parallelity ?

Dan Fabulich Sun, 22 Nov 2009 12:06:53 -0800

I like it!

Well, except for the "1 thread per module" part; that's clearly too manythreads. You'd want a fixed thread pool.

But restructuring the multithreading around the individual phases and*scheduling* phases from later projects when earlier project phases aredone seems workable.

We probably would have thought of this earlier (or, at least, *I* wouldhave) if the default reactor behavior worked like that.

Today, "mvn install" will first compile, test, and install project A, thencompile, test, and install project B, and then compile, test and installproject C.

But even without multithreading, you could imagine the reactor compilingA, then compiling B, then compiling C, then testing A, then testing B,then testing C, and finally installing A, installing B, and installing C.This strategy might fail faster than today's project-by-project strategy.

I propose that we first implement this as an optional reactor strategy(via a special command-line argument) in singlethreaded mode, and work outall the kinks. Once we're pretty happy with that, we can add support forit in multithreaded mode. It's especially important that it be at least*possible* to run it in singlethreaded mode, in case it causes problemsfor some projects independently of multithreading.

BTW, what would we call this new mode? Perhaps we'd call it "weave" mode,because we're going across the projects horizontally in lifecycle order.I initially thought we might call it "breadth-first" as opposed to"depth-first," but that's a bad name because it sounds like we'rereordering the projects.



So, where might we find kinks in "weave" mode?

* Correctly implementing reactor failure behavior (--fail-fast,--fail-at-end, --fail-never) with blacklisting

* What happens if we specify multiple lifecycles?  "mvn compile test"

* What happens if we just specify the raw goals? "mvnmyPlugin:goal"

* What if we mix and match? "mvn compile myPlugin:goal test"

* What if we put clean last? Would we clean projects while laterprojects depend on them? "mvn compile clean"* What if the reactor is building a plugin that is used later in thereactor?* How would users resume "weave" mode? (Today we allow users to--resume-from a particular project.) Would "weave" users resume from aparticular project+phase? Would resuming even be reasonable? If youchanged a class, you'd need to recompile it and THEN retest it...

These are the sort of areas where we'd want to have a good singlethreadedimplementation with integration tests BEFORE plowing ahead with amultithreaded implementation.


-Dan

Kristian Rosenvold wrote:

I've looked over the code and thought a bit further about the
constraints involved, and given that:

- Multi module reactor builds are the only interesting targets of
multithreading.
- Reactor builds do not use the "install" output of their upstream
dependencies (I was not aware of that ;)

You do not have to re-order anything at all. An implementation
could just:
A) Immediately fork 1 thread per module for all modules.
B) For the phases compile, install and deploy, a given module can
only proceeed when all its upstream dependencies have completed the same
state
There's still a chance of leaking artifacts to local repository if
upstream deploy fails after install, and the general idea of a
transacted repo would still be nice to stay consistent.

I'm still a bit sure about B) above, it may be a bit limiting in terms
of other usage scenarios. I'm also a bit sure how that'd fit in with all
the other activities in the lifecycle. An alternative would be to
make a declarative-representation of phase-interdependencies that could
express multiple types of concurrency-interdependencies. (But I
consistenly only see one dependency type -
upstreamMustFinishBeforeThisCanStart...?)

Would it float ?

Kristian


lø., 21.11.2009 kl. 11.40 +0000, skrev Stephen Connolly:

In m3 (which is what we are talking about) AFAIK we can have a
listener that waits for the end of the start of the deploy phase
and/or the end of execution.

With a customized install plugin, we could just install to the
"transaction" repository.  The listener can then block until the
criteria have been met (allowing other modules to progress) That would
achieve what you're after... namely, produce the artifacts for
consumption by the other modules before running test and
integration-test. Once the criteria have been met, we either fail the
module or we move the artifacts from the "transactional" local repo to
the real local repo and allow the lifecycle to continue

-Stephen

2009/11/21 Kristian Rosenvold <[email protected]>:

I seem to understand that there's room for several different
types of solution here;

Starting with the single-machine solution; I now understand that
you could start forking downstream builds straight after
compile in a reactor build, maybe after install in other cases.

In this scenario I think each module is dependant on all upstream
modules successfully achieving "install" before proceeding to "deploy".
I really think it's important to avoid leaking artifacts that do not
have its own (and all upstream) lifecycle requirements fulfilled.

When it comes to clustering there may be several approaches:
If you decide to publish artifacts through "deploy" to any kind
of repo I believe these require to have all lifecycle requirements met,
which at my current understanding seems orthogonal to local out-of-order
execution.

Wouldn't it be feasible to distribute the "local" and perhaps
"transacted local" repo inside the cluster using network
file sharing ? One would still have to solve serialization issues
and using installed artifacts in a reactor build..?

The clustering case seems like a much harder task than achieving
full local concurrency. I did some fairly extensive measurements
with my current build when I set up concurrent spring/junit testing:

Missing concurrency in classloading is the most important reason
why unit tests run slowly (classloading is strictly a synchronized
business until jdk7). By running tests out-order on my local
unit test-build I am fairly certain I could reduce run-time
for "mvn clean install" to something much closer to "mvn
-Dmaven.test.skip=true clean install" (80->25 seconds in my case).
This is even before I start parallelizing the individual modules.

I must confess that I've yet to see a build that really needs
clustering for any other reason than running tests or other individual
tasks (javadoc, site etc). I think I'd be inclined to just distributing
those specific tasks in a cluster. If you actually had a decent model of
inter-lifecycle phase dependencies (requiredForStarting between phases),
you could probably achieve good results by keeping lifecycle execution
centralized but ditributing plugin execution ?

I suppose I may be narrow-minded on this last one...

I will be starting to look at the DefaultLifeCycleExecutor with thoughts
of out-of-order execution, maybe dabble around a little.

Kristian

fr., 20.11.2009 kl. 06.29 -0800, skrev Dan Fabulich:

I've been meaning to reply to your earlier emails (it's been a busy week);
to this I'll just say that moving the "test" phase after the "install"
phase is a fascinating idea, which I personally like, but it seems like a
big violation of the contract for the lifecycle, and I suspect it won't be
popular. :-(

I've long felt that there should be a phase for testing after "install"
for similar reasons.  This might be SLIGHTLY more popular since users
would need to explicitly cause their tests to run during this phase.

What about users doing multi-machine builds?  Earlier this week I wrote
that users desiring to do multi-machine parallelism should deploy their
builds to a remote repository shared between the machines.  Should their
tests run post-deploy?

-Dan


Kristian Rosenvold wrote:

I've been thinking further about parallelity within maven. The proposed
solution to MNG-3004
achieves parallelity by analyzing inter-module dependencies and scheduling
parallel dependencies in parallel.

A simple further evolution of this would be to collect and download all
external dependencies
for all modules immediately.

But this idea has been rummaging in my head while jogging for a week or so:

Would it be possible to achieve super-parallelity by describing
relationships between phases of the build, and even reordering some of the
phases ? I'll try to explain:

Assume that you can add transactional ACID (or maybe just AID) abilities
towards the local
repo for a full build. Simply put: All writes to a local repo is done in a
per-process-specific instance of the repo, that can be rolled back if the
build fails (or pushed to the local repo if
the build is ok)

If you do that you can re-order the life-cycle for most builds to be
something like this:

validate
compile
package
install
test
integration-test
deploy

Notice that I just moved all the "test" phases after the "install" phase.
Theoretically you could start any subsequent modules immediately after
"install" is done. Running of tests is really the big killer in most
multi-module projects I see.

Since your commit "push" towards the local repo only happens at the very end
of the build, you
will not publish artifacts when tests are failing (at leas not project
output artifacts)

You could actually make this a generic model that describes deifferent kinds
of
dependencies between lifecycle phases of different modules. The dependency I
immediately
see is "requiredForStarting" - which could be interpreted as meaning that
any upstream
dependencies must have reached at least that phase before the phase can be
started
for this project. I'm not sure if there's any value in a generic model, but
my perspective
may be limited to what I see on a daily basis.

Would this be feasible ?



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]




---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]




---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: MNG-3004/MNG-2802 - Achieving massive parallelity ?

Reply via email to