Steve Appling wrote:
I am interested in ways to short circuit task execution for the
purpose of optimization. I would love to see some of this in 0.7 and
would be glad to contribute.
Here are some ideas:
1) Add an "onlyIf" method to Task that is given a closure. The closure
would be executed before the first action of the task and would cancel
execution of the task (with appropriate lifecycle message) if it
returned false. This closure would have as a delegate an optimization
container with some helper methods that would provide more convenient
access to change detection (among other things). Then you could do:
mytask.onlyIf {
timestampChanged 'src/main/mysrc'
// or contentsChanged 'src/main/mysrc'
}
I think this is a good idea.
2) Running a clean should probably remove the change detection state
information for a project (or at least the clean task should be able
to be configured to do this conveniently).
I think the change detection mechanism should figure out that the output
artifacts don't exist any more instead.
One thing that clean should arguably get rid of is the internal
repository in $rootDir/.gradle. I wonder if it should also clean the
buildSrc project?
3) I would like some general way for tasks to indicate that they did
anything. Perhaps task.getDidWork(). BTW, I figured out how to do
this for gradle's use of ant.javac and can now tell if it really
compiled anything.
When you say 'it really compiled anything' do you mean you can tell
whether the task decided to invoke javac or not?
I think it would be better if Gradle could figure out whether a task did
anything, rather than require the task writer to do anything. I think we
could assume that if a task executes any task action, it has done work.
If a task wants to do any short-circuiting, it would need to use an
onlyIf() predicate. In addition, if we provided any easy way for a task
to declare its output artifacts, then Gradle can additionally
automatically apply change detection to these output artifacts in order
to decide whether the task did any work.
So, instead of adding a Task.didWork property, perhaps we should merge
this concept with the existing Task.executed property into a single
read-only Task.state property with an enum with values something like:
created, executed, or skipped.
4) I would like to be able to specify that a chain of dependent tasks
only execute a task if Task.didWork is true for all of its
dependents. Note that this is not always desired, so you need to be
able to turn this on and off. I'm not sure of the best way to
configure this. If we use the onlyIf method suggested above, it might
take another closure to check this that would be returned from a
"needed" method. This would look like:
myTask.onlyIf(needed())
This probably should be the default for tests, but perhaps not for all
Tasks.
I'm not sure about this approach.
The tests should run if either the test classes or the classes under
test have changed since last time we successfully ran the tests.
Arguably a change to the test runtime classpath should also cause the
tests to run. In other words, the tests should be run only if the input
artifacts have not changed since last time we ran the tests. Checking
whether all the dependencies of the test task have executed or not is
only an approximation of this, and not a general solution. For example,
if I assemble my classes under test using, say, 2 independent Compile
tasks, then the test task should run if either task has done something.
Or, I may assemble my classes using some other build tool, so that
there's no task which we can use to check whether or not the classes
have changed.
To me, the key to task optimisation is to base it on the input and
output artifacts of a task. If we make it easy to declare both the input
and output artifacts of a task, we make the model much richer, and from
this we get a lot of goodness.
For example, if we know what the input artifacts for a task are, Gradle
can apply change detection to those input artifacts on the task's
behalf. If we also know which tasks produce those artifacts, then Gradle
can optimise the change detection. Gradle could, for example, when it
knows which task produces a given artifact, simply use the fact that the
producer task executed an action or not to decide whether the input
artifacts have changed, and only fall back to hashing or timestamps or a
Java 7 file watcher or whatever when it doesn't know how the artifact is
produced. Similarly, it could use the fact that a Jar was downloaded by
the dependency management system to decide whether the input artifacts
have changed.
Adding input and output artifacts to the model also lets us use this
information to build the DAG, and to be smart about skipping tasks. For
example, if the test task were to declare that it uses the tests classes
directory and the test runtime configuration as input artifacts, then
Gradle would be able to automatically add the tasks that produce these
(if any) to the task dependencies of the test task.
Knowing which tasks produce and consume a given artifact also allows us
to extract concurrency constraints from the model. If 2 tasks both
contribute to the production of the same artifact (classes dir, say),
they should not run concurrently. Or if 2 tasks both consume the same
artifact, they should not run concurrently. And obviously a producer and
consumer task for a given artifact should not run concurrently.
Extending this further, if we know the input and output artifacts of a
task, or subgraph of tasks, we can distribute the work to remote machines.
Javac is already checking to see if the source files are out of date
with the classes, so I don't think that the javac task needs to use
the new changedetection. This would, however let you stop other tasks
in the chain (like test) if nothing needed to be compiled.
(unrelated: I would also like to see an option on compile to use Ant's
depend task. I think the current dependencyTracking option doesn't
work with the modern compiler. )
Other types of tasks could make good use of Tom's change detection.
5) We probably want a command line option to be able to disable all of
these optimizations. Sometimes you really want to force a build with
no optimizations (without running clean).
In the race for speed, Gradle will probably never catch Ant in a clean
build (at least while you are delegating most of the expensive stuff
to ant).
I wonder. The richer our model, the more scope we have to optimise
without the build script author or task author to doing anything
special. We can automatically extract parallelism. We can inline and
batch tasks. We can distribute bits of the build. We can reuse work that
other machines have already done.
Adam
---------------------------------------------------------------------
To unsubscribe from this list, please visit:
http://xircles.codehaus.org/manage_email