Re: [gradle-dev] Task Optimization

Steve Appling Thu, 25 Jun 2009 05:14:03 -0700


Adam Murdoch wrote:

Steve Appling wrote:
I am interested in ways to short circuit task execution for thepurpose of optimization. I would love to see some of this in 0.7 andwould be glad to contribute.
Here are some ideas:
1) Add an "onlyIf" method to Task that is given a closure. The closurewould be executed before the first action of the task and would cancelexecution of the task (with appropriate lifecycle message) if itreturned false. This closure would have as a delegate an optimizationcontainer with some helper methods that would provide more convenientaccess to change detection (among other things). Then you could do:
  mytask.onlyIf {
    timestampChanged 'src/main/mysrc'
    // or contentsChanged 'src/main/mysrc'
  }
I think this is a good idea.
2) Running a clean should probably remove the change detection stateinformation for a project (or at least the clean task should be ableto be configured to do this conveniently).
I think the change detection mechanism should figure out that the outputartifacts don't exist any more instead.
One thing that clean should arguably get rid of is the internalrepository in $rootDir/.gradle. I wonder if it should also clean thebuildSrc project?
3) I would like some general way for tasks to indicate that they didanything. Perhaps task.getDidWork(). BTW, I figured out how to dothis for gradle's use of ant.javac and can now tell if it reallycompiled anything.
When you say 'it really compiled anything' do you mean you can tellwhether the task decided to invoke javac or not?

Ant's javac scans the source and class files itself to see if any source filesare newer than the corresponding class files. If so, it then calls Java's javacwith this list of outdated files. After executing the gradle task, I candetermine which files were actually passed to Java's javac by ant. For severaltypes of tasks (compile, groovycompile, copy, directory, zip, jar, tar), thetask is already doing its own optimization by comparing source timestamps tosome target during execution. It is possible to execute the task without ithaving any side effects. Since most of them have the information about whatthey actually did, it seems better (and faster) to use this information insteadof scanning source / output a second time externally to see what changed.

I think it would be better if Gradle could figure out whether a task didanything, rather than require the task writer to do anything.

I would like this, but I'm not sure how to accomplish it in the general case.Tasks may have input/output other than just a set of files (like networkoperations, web services calls, deploy over webdav). Even tasks like copy maydo the work in a way that makes it hard to see what happened after the fact. Iknow that we have several tasks which have output that is put into the samedirectory with the output from other tasks. It would not be sufficient to justscan the output directories after each execution since they would also includethe results from other tasks. If you never allow parallel execution, then youcould scan the output directories both before and after a tasks execution, butthis seems expensive. If the task already knows what it did, why not make useof that information.

For custom tasks (instances of DefaultTask), it seems simpler for a build writerto set some state to indicate if they did anything than to specify the set offiles to check. If this check is best done by comparing files, then we shouldprovide easy ways to call into the change detection code to set this state.

I think we could assume that if a task executes any task action, it has done work.

I don't think this is true. As I discussed above, there are many tasks (likecompile) that execute their task action, but decide during execution to notcause any side effects.

If a task wants to do any short-circuiting, it would need to use anonlyIf() predicate. In addition, if we provided any easy way for a taskto declare its output artifacts, then Gradle can additionallyautomatically apply change detection to these output artifacts in orderto decide whether the task did any work.
So, instead of adding a Task.didWork property, perhaps we should mergethis concept with the existing Task.executed property into a singleread-only Task.state property with an enum with values something like:created, executed, or skipped.

I think you should be able to distinguish executed and did something fromexecuted and didn't do anything.

4) I would like to be able to specify that a chain of dependent tasksonly execute a task if Task.didWork is true for all of itsdependents. Note that this is not always desired, so you need to beable to turn this on and off. I'm not sure of the best way toconfigure this. If we use the onlyIf method suggested above, it mighttake another closure to check this that would be returned from a"needed" method. This would look like:
  myTask.onlyIf(needed())
This probably should be the default for tests, but perhaps not for allTasks.
I'm not sure about this approach.

After trying to implement some of this, I no longer like all of this approacheither. I don't think there is anything appropriate to do "for a chain ofdependent tasks". I do still like the general idea of onlyIf { isNeeded() }. Ithink that isNeeded may be a good place contain any mechanism for Gradle toautomatically determine if artifacts it depends changed or tasks it depends ondid work.

The tests should run if either the test classes or the classes undertest have changed since last time we successfully ran the tests.Arguably a change to the test runtime classpath should also cause thetests to run. In other words, the tests should be run only if the inputartifacts have not changed since last time we ran the tests. Checkingwhether all the dependencies of the test task have executed or not isonly an approximation of this, and not a general solution. For example,if I assemble my classes under test using, say, 2 independent Compiletasks, then the test task should run if either task has done something.Or, I may assemble my classes using some other build tool, so thatthere's no task which we can use to check whether or not the classeshave changed.
To me, the key to task optimisation is to base it on the input andoutput artifacts of a task. If we make it easy to declare both the inputand output artifacts of a task, we make the model much richer, and fromthis we get a lot of goodness.
For example, if we know what the input artifacts for a task are, Gradlecan apply change detection to those input artifacts on the task'sbehalf. If we also know which tasks produce those artifacts, then Gradlecan optimise the change detection. Gradle could, for example, when itknows which task produces a given artifact, simply use the fact that theproducer task executed an action or not to decide whether the inputartifacts have changed, and only fall back to hashing or timestamps or aJava 7 file watcher or whatever when it doesn't know how the artifact isproduced. Similarly, it could use the fact that a Jar was downloaded bythe dependency management system to decide whether the input artifactshave changed.
Adding input and output artifacts to the model also lets us use thisinformation to build the DAG, and to be smart about skipping tasks. Forexample, if the test task were to declare that it uses the tests classesdirectory and the test runtime configuration as input artifacts, thenGradle would be able to automatically add the tasks that produce these(if any) to the task dependencies of the test task.
Knowing which tasks produce and consume a given artifact also allows usto extract concurrency constraints from the model. If 2 tasks bothcontribute to the production of the same artifact (classes dir, say),they should not run concurrently. Or if 2 tasks both consume the sameartifact, they should not run concurrently. And obviously a producer andconsumer task for a given artifact should not run concurrently.
Extending this further, if we know the input and output artifacts of atask, or subgraph of tasks, we can distribute the work to remote machines.

I think it might be a good approach to first add support for the onlyIf clauseand some helpers to allow manual use of optimization and then investigatetechniques to allow Gradle to be smarter about this and do more automatically.If Gradle just adds optimization rules to tasks in the built in plugins anddoesn't provide automated optimization for custom tasks you will still get a lotof benefit.

I generally like the idea of a richer model that has information about what eachtask consumes and produces, but I'm not clear exactly how this would bespecified. I don't want to require the build writer to duplicate informationabout what the task inputs / outputs are. I would love to see some examples ofhow this would work for general tasks.

Javac is already checking to see if the source files are out of datewith the classes, so I don't think that the javac task needs to usethe new changedetection. This would, however let you stop other tasksin the chain (like test) if nothing needed to be compiled.(unrelated: I would also like to see an option on compile to use Ant'sdepend task. I think the current dependencyTracking option doesn'twork with the modern compiler. )
Other types of tasks could make good use of Tom's change detection.
5) We probably want a command line option to be able to disable all ofthese optimizations. Sometimes you really want to force a build withno optimizations (without running clean).
In the race for speed, Gradle will probably never catch Ant in a cleanbuild (at least while you are delegating most of the expensive stuffto ant).
I wonder. The richer our model, the more scope we have to optimisewithout the build script author or task author to doing anythingspecial. We can automatically extract parallelism. We can inline andbatch tasks. We can distribute bits of the build. We can reuse work thatother machines have already done.
Adam


--
Steve Appling
Automated Logic Research Team

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

   http://xircles.codehaus.org/manage_email

Re: [gradle-dev] Task Optimization

Reply via email to