[gradle-dev] fixing the configuration model

Adam Murdoch Thu, 05 Sep 2013 17:24:14 -0700

Hi,

Luke and I have been discussing ways that we can improve Gradle's configuration 
model - to get rid of timing weirdness, to give better diagnostics, to do less 
work, to do things in parallel, and so on.


We've started by recognising two things: Firstly, there's no real difference 
between configuring (or building) a model and building some files. And 
secondly, there is a very powerful approach that we already use for building 
files - that is, tasks.

You can generalise what a task is, and think about about it as a function that 
takes a set of inputs and produces some outputs. These inputs and outputs are 
declared, so that we know what they are without having to execute the function. 
And this means, since we know the inputs and outputs, we can order execution so 
that a function is not executed until its inputs are available. Also - very 
importantly - we can order execution so that the inputs of a function are 
immutable, so that everything else that affects that input has already been 
executed.

You can also think about a build script or plugin in a similar way: these are a 
set of functions, each of which take some inputs and produces some outputs. 
Usually, but certainly not always, these inputs and outputs are model elements 
(where a task is just another model element). So, let's say we add some way for 
Gradle to infer the inputs and outputs of each such configuration function. 
Then the goodness we have for tasks can also be used for configuration.

So when a configuration function executes, Gradle would make sure that its 
inputs are available and immutable. For example, say I'm implementing a plugin 
where I don't know which tasks to create until after my model has been 
configured (e.g. an Android plugin or a C++ plugin). I can structure my plugin 
so that I have a configuration function that takes my model as input and 
produces tasks as output. Gradle won't call this function until the model has 
been completely configured - whether that's by a build script, or some custom 
plugin, or injected by an init script or some combination. As the plugin author 
I don't know or care where that configuration happens, only that it has 
happened, and that it won't happen after I've used the model.

There are some more useful things that knowing tasks inputs and outputs gives 
us: We only run the tasks that are required to produce the requested outputs, 
and we can run unrelated task in parallel. The same would be true of 
configuration functions.

For tasks, we also short-circuit a task whose inputs and outputs have not 
changed since last time it was run. We can do this because we persist the 
inputs and outputs, or at least enough about them to know if they've changed or 
not. If we were to persist the inputs and outputs of configuration functions, 
then we'd be able to do the same for configuration.

Things get very interesting at this point. We'd be serialising the models, as 
these are the inputs and outputs of configuration functions. And because we can 
serialise them, we can ship models between threads, and jvms, and over time. 
This means, for example:

- Task-level parallel execution.
- Distributed configuration and execution.
- Configuration and execution can happen in different jvms. For example, I can 
configure my C++ component model locally, and then run the compile and link 
tasks across a bunch of machines, one for each architecture that I build.
- Tasks can run in an isolated jvm. For example, I might run all my compile 
tasks in a separate warmed-up jvm.
- I can spread my build over many jvms to put together a massive (logical) heap.
- We can spool models out to disk when heap space is tight.
- We can ship models between builds. For example, my CI build might publish the 
entire build model along with the artefacts. My dev build can download this 
model from the repository and use it, instead of running all the configuration 
logic.
- Fast tooling, as we only need to run the configuration logic that has changed.
- Keep the model up-to-date in the daemon, in the background.
- Implement some really nice build reproducibility. To reproduce a build, I can 
download the model that was used for that build, and set up all the inputs so 
that they are exactly the same as those used last time (and/or let the user 
know what's different or could not be made the same). And this would work for 
custom models, not just core models.
- Implement some nice reporting and auditing: what inputs have changed between 
these two builds? what inputs were used to build this thing? I can answer these 
kinds of question without running the build.

You get the idea.

Something to note: We're not talking about using tasks to configure models. 
Instead, we're talking here about some underlying concept that is used both for 
building things and configuring models. Tasks would be some sugar over this 
underlying concept, to make it easy to building things. We'd introduce some 
equivalent DSL and APIs which would be sugar to make it easy to configure 
models.

What this means is that we can blur the lines between building things and 
configuring models. For example, things that are outputs of tasks can be the 
input for some configuration logic. For example, I might use a plugin or task 
produced by another project to configure the current project. That thing needs 
to be built before I can configure the project. Or perhaps I need to generate 
some source files before I decide what which artefacts to include in my 
publication. By using a single concept, and treating models and files as things 
that can be inputs and outputs, we can offer a very flexible configuration 
model.

Of course, it's not that simple. There are some 'interesting' problems to solve 
as far as backwards compatibility, usability and performance goes, plus lots of 
details to sort out. We've worked through a lot of this and we think we've got 
answers for much of this, and a plan for introducing this incrementally.

More details to come later in the form of a spec and/or code. There's also a 
bit of a spike checked into master, where we've ported the publishing plugins 
to use this approach.


--
Adam Murdoch
Gradle Co-founder
http://www.gradle.org
VP of Engineering, Gradleware Inc. - Gradle Training, Support, Consulting
http://www.gradleware.com

Join us at the Gradle eXchange 2013, Oct 28th in London, UK: 
http://skillsmatter.com/event/java-jee/gradle-exchange-2013

[gradle-dev] fixing the configuration model

Reply via email to