Re: [gradle-dev] Task Optimization

Adam Murdoch Thu, 25 Jun 2009 03:03:17 -0700


Steve Appling wrote:

I am interested in ways to short circuit task execution for thepurpose of optimization. I would love to see some of this in 0.7 andwould be glad to contribute.
Here are some ideas:
1) Add an "onlyIf" method to Task that is given a closure. The closurewould be executed before the first action of the task and would cancelexecution of the task (with appropriate lifecycle message) if itreturned false. This closure would have as a delegate an optimizationcontainer with some helper methods that would provide more convenientaccess to change detection (among other things). Then you could do:
  mytask.onlyIf {
    timestampChanged 'src/main/mysrc'
    // or contentsChanged 'src/main/mysrc'
  }


I think this is a good idea.

2) Running a clean should probably remove the change detection stateinformation for a project (or at least the clean task should be ableto be configured to do this conveniently).

I think the change detection mechanism should figure out that the outputartifacts don't exist any more instead.

One thing that clean should arguably get rid of is the internalrepository in $rootDir/.gradle. I wonder if it should also clean thebuildSrc project?

3) I would like some general way for tasks to indicate that they didanything. Perhaps task.getDidWork(). BTW, I figured out how to dothis for gradle's use of ant.javac and can now tell if it reallycompiled anything.

When you say 'it really compiled anything' do you mean you can tellwhether the task decided to invoke javac or not?

I think it would be better if Gradle could figure out whether a task didanything, rather than require the task writer to do anything. I think wecould assume that if a task executes any task action, it has done work.If a task wants to do any short-circuiting, it would need to use anonlyIf() predicate. In addition, if we provided any easy way for a taskto declare its output artifacts, then Gradle can additionallyautomatically apply change detection to these output artifacts in orderto decide whether the task did any work.

So, instead of adding a Task.didWork property, perhaps we should mergethis concept with the existing Task.executed property into a singleread-only Task.state property with an enum with values something like:created, executed, or skipped.

4) I would like to be able to specify that a chain of dependent tasksonly execute a task if Task.didWork is true for all of itsdependents. Note that this is not always desired, so you need to beable to turn this on and off. I'm not sure of the best way toconfigure this. If we use the onlyIf method suggested above, it mighttake another closure to check this that would be returned from a"needed" method. This would look like:
  myTask.onlyIf(needed())
This probably should be the default for tests, but perhaps not for allTasks.


I'm not sure about this approach.

The tests should run if either the test classes or the classes undertest have changed since last time we successfully ran the tests.Arguably a change to the test runtime classpath should also cause thetests to run. In other words, the tests should be run only if the inputartifacts have not changed since last time we ran the tests. Checkingwhether all the dependencies of the test task have executed or not isonly an approximation of this, and not a general solution. For example,if I assemble my classes under test using, say, 2 independent Compiletasks, then the test task should run if either task has done something.Or, I may assemble my classes using some other build tool, so thatthere's no task which we can use to check whether or not the classeshave changed.

To me, the key to task optimisation is to base it on the input andoutput artifacts of a task. If we make it easy to declare both the inputand output artifacts of a task, we make the model much richer, and fromthis we get a lot of goodness.

For example, if we know what the input artifacts for a task are, Gradlecan apply change detection to those input artifacts on the task'sbehalf. If we also know which tasks produce those artifacts, then Gradlecan optimise the change detection. Gradle could, for example, when itknows which task produces a given artifact, simply use the fact that theproducer task executed an action or not to decide whether the inputartifacts have changed, and only fall back to hashing or timestamps or aJava 7 file watcher or whatever when it doesn't know how the artifact isproduced. Similarly, it could use the fact that a Jar was downloaded bythe dependency management system to decide whether the input artifactshave changed.

Adding input and output artifacts to the model also lets us use thisinformation to build the DAG, and to be smart about skipping tasks. Forexample, if the test task were to declare that it uses the tests classesdirectory and the test runtime configuration as input artifacts, thenGradle would be able to automatically add the tasks that produce these(if any) to the task dependencies of the test task.

Knowing which tasks produce and consume a given artifact also allows usto extract concurrency constraints from the model. If 2 tasks bothcontribute to the production of the same artifact (classes dir, say),they should not run concurrently. Or if 2 tasks both consume the sameartifact, they should not run concurrently. And obviously a producer andconsumer task for a given artifact should not run concurrently.

Extending this further, if we know the input and output artifacts of atask, or subgraph of tasks, we can distribute the work to remote machines.

Javac is already checking to see if the source files are out of datewith the classes, so I don't think that the javac task needs to usethe new changedetection. This would, however let you stop other tasksin the chain (like test) if nothing needed to be compiled.(unrelated: I would also like to see an option on compile to use Ant'sdepend task. I think the current dependencyTracking option doesn'twork with the modern compiler. )
Other types of tasks could make good use of Tom's change detection.
5) We probably want a command line option to be able to disable all ofthese optimizations. Sometimes you really want to force a build withno optimizations (without running clean).
In the race for speed, Gradle will probably never catch Ant in a cleanbuild (at least while you are delegating most of the expensive stuffto ant).

I wonder. The richer our model, the more scope we have to optimisewithout the build script author or task author to doing anythingspecial. We can automatically extract parallelism. We can inline andbatch tasks. We can distribute bits of the build. We can reuse work thatother machines have already done.



Adam


---------------------------------------------------------------------
To unsubscribe from this list, please visit:

   http://xircles.codehaus.org/manage_email

Re: [gradle-dev] Task Optimization

Reply via email to