Hi,

Just some thoughts on how we might spike a solution for incremental java 
compilation, to see if it’s worthwhile and what the effort might be:

The goal is to improve the Java compile tasks, so that they do less work for 
certain kinds of changes. Here, ‘less work’ means compiling fewer source files, 
and also touching fewer output files so that consumers of the task output can 
also do less work. It doesn’t mean compiling the *fewest* possible number of 
source files - just fewer than we do now.

The basic approach comes down to keeping track of dependencies between source 
files and the other compilation inputs - where inputs are source files, the 
compile classpath, the compile settings, and so on. Then, when an input 
changes, we would recompile the source files that depend on that input. 
Currently, we assume that every source file depends on every input, so that 
when an input changes we recompile everything.

Note that we don’t necessarily need to track dependencies at a fine-grained 
level. For example, we may track dependencies between packages rather than 
classes, or we may continue to assume that every source file depends on every 
class in the compile classpath.

A basic solution would look something like:

1. Determine which inputs have changed.
2. If the compile settings have changed, or if we don’t have any history, then 
schedule every source file for compilation, and skip to #5.
3. If a class in the compile classpath has changed, then schedule for 
compilation every source file that depends on this class.
4. If a source file has changed, then schedule for compilation every source 
file that depends on the classes of the source file.
5. For each source file scheduled for compilation, remove the previous output 
for that source file.
6. Invoke the compiler.
7. For each successfully compiled source file, extract the dependency 
information for the classes in the source file and persist this for next time.

For the above, “depends on” includes indirect dependencies.

Steps #1 and #2 are already covered by the incremental task API, at least 
enough to spike this.

Step #3 isn’t quite as simple as it is described above:
- Firstly, we can ignore changes for a class with a given name, if a class with 
the same name appears before it in the classpath (this includes the source 
files).
- If a class is removed, this counts as a ‘change’, so that we recompile any 
source files that used to depend on this class.
- If a class is added before some other class with the same name in the 
classpath, then we recompile any source files that used to depend on the old 
class.
- Dependencies can travel through other classes in the classpath, or source 
files, or a combination of both (e.g. a source class depends on a classpath 
class depends on a source class depends on a classpath class).

Step #4 is similar to step #3.

For a spike, it might be worth simply invalidating everything when the compile 
classpath changes, and just deal with changes in the source files.

For step #7 we have three basic approaches for extracting the dependencies:

The first approach is to use asm to extract the dependencies from the byte code 
after compilation. The upside is that this is very simple to implement and very 
fast. We have an implementation already that we use in the tooling API 
(ClasspathInferer  - but it’s mixed in with some other stuff). It also works 
for things that we only have the byte code for.

The downside is that it’s lossy: the compiler inlines constants into the byte 
code and discards source-only annotations. We also don’t easily know what type 
of dependency it is (is it an implementation detail or is is visible in the API 
of the class?)

Both these downsides can be addressed: For example we might treat a class with 
a constant field or a class for a source-only annotation as a dependency of 
every source file, so that when one of these things change, we would recompile 
everything. And to determine the type of dependency, we just need to dig deeper 
into the byte code.

The second approach is to use the compiler API that we are already using to 
invoke the compiler to query the dependencies during compilation. The upside is 
that we get the full source dependency information. The downsides are that we 
have to use a sun-specific extension of the compiler API to do this and it’s a 
very complicated API, which means fiddly to get right.

The third approach is to parse and analyse the source separately from 
compilation.

I’d probably try out the first option, as it’s the simplest to implement and 
probably the fastest at execution time.

There are some issues around making this efficient.

First, we need to make the persistence mechanism fast. For the spike, let’s 
assume we can do this. I would just keep the state in some static field 
somewhere and not bother with persistence.

Second, we need to make the calculation of affected source files fast. One 
option is to calculate this when something changes rather than each time we run 
the compilation task, so that we keep, basically, a map from input file to the 
closure of all source files affected by that input file.

Third, we need to keep the dependency graph as small as we can. So, we might 
play around with tracking dependencies between packages rather than classes. We 
should also ignore dependencies that are not visible to the consumer, so that 
we don’t traverse the dependencies of method bodies, or private elements.

Finally, we should ignore changes that are not visible to the consumer, so that 
we ignore changes to method bodies, private elements of a class, the 
annotations of classes, debug info and so on. This is relatively easy for 
changes to the compile classpath. For changes to source files, it’s a bit 
trickier, as we don’t know what’s changed until we compile the source file. We 
could, potentially, compile in two passes - first source files that have 
changed and then second source files that have not change but depend on those 
that have. Something, potentially, to play with as part of a spike.


--
Adam Murdoch
Gradle Co-founder
http://www.gradle.org
VP of Engineering, Gradleware Inc. - Gradle Training, Support, Consulting
http://www.gradleware.com



Reply via email to