This is a design doc about speeding up the link phase of GWT. If you don't maintain a linker, and if you don't have a multi-machine GWT build, then none of this should matter to you. If you do maintain a linker, let's make sure your linker can be updated with the proposed changes. If you do have a multi-machine build, or if you have some ideas about them, then perhaps you can help us get the best speed benefit possible out of this.
I want to speed up linking for multi-machine builds in two ways: 1. Allow more parts of linking to run in parallel. In particular, anything that happens once per permutation and does not need information from other permutations can run in parallel. As an example, the iframe linker chunks the JavaScript of each permutation into multiple <script> tags. That work can happen in parallel once the linker API supports it. 2. Link does a lot of Java serialization for its artifacts, but the majority of the artifacts in a compile are emitted artifacts that have no structure. They are just a named bag of bits, from the compiler's perspective. It would help if such artifacts did not need a round of Java serialization on the Link node and could instead be bulk copied. === Transition === The compiler will support two compilation modes: maximal sharding and simulated sharding. Maximal sharding is used when all linkers support it and the Precompile/CompilePerms/Link entry points are used. Simulated sharding is used when either some linker can't shard or when the Compiler entry point is used. Linkers individually indicate whether they implement the sharding or non-sharding API. This allows linkers to be updated one by one and to leave the non-sharding API behind once they do. It does not cause trouble with other linkers, because in practice linkers are highly independent. I've looked at as many linkers as I could find to verify this. Occasionally one linker depends on another; in such a case they'll have to be updated in tandem, but the need for that should be rare. By default, a linker is assumed to want the legacy non-sharding API. For such linkers, it isn't safe to assume it generators or its associated artifacts can be safely serialized and then deserialized on a different computer. The non-sharding API will be deprecated. After the sharding API has been out for one GWT release cycle, support for non-shardable linkers will be dropped. === Maximal sharding === Currently, Precompile parses Java into ASTs and runs generators. CompilePerms then runs one copy for each permutation, in parallel. Each instance optimizes the AST for one permutation and then converts it into JavaScript plus some additional artifacts. Finally, Link takes the JavaScript and all the produced artifacts, runs the individual linkers, and produces the final output. In summary, the three stages are: current Precompile: - parse Java and run generators - output: number of permutations, AST, generated artifacts current CompilePerms: - input: permutation id, AST - compile one permutation to JavaScript - output: JavaScript, generated artifacts current Link: - input: JavaScript from all permutations, generated artifacts - run linkers on all artifacts - emit EmittedArtifacts into the final output With maximal sharding, Precompile does no work except to count the number of permutations. Each CompilePerms instance parses Java ASTs, run generators, and optimizes for a specific permutation. Additionally, each CompilePerms instance also runs the shardable part of linkers on the results for that permutation. It then "thins" the artifacts (see below) and emits them. Finally, Link takes these results from the CompilePerms instances, runs the final, non-shardable part of each linker, and emits all the artifacts designated as emitted artifacts. In summary, the maximal-sharding staging looks like this: new Precompile: - output: number of permutations new CompilePerms: - input: permutation id - compile one permutation to JavaScript, including running generators - run the on-shard part of linkers - thin down the resulting artifacts, as defined below - output: JavaScript and the thinned down set of artifacts new Link: - input: JavaScript and transferable artifacts from each permutation - run the final part of linkers, which can add more files to the final output - output: resulting emitted artifacts === Simulated Sharding === Simulated sharding uses the in-trunk compiler staging, but runs the linkers as much as possible as if they were using the maximal sharding staging. The sequence is the same whether the Compiler entry point is used or the Precompile/CompilePerms/Link trio of entry points is used. Under simulated sharding, the Precompile and CompilePerms steps run exactly as in trunk. The Link stage, however, runs the linkers in a careful order so as to use the sharded API for those linkers that have been updated: - For each compiled permutation, run the on-shard part of all shardable linkers. For each permutation, start with a fresh set of artifacts so that the linkers don't see each other's output. - Combine all of the resulting artifacts. - Run the non-shardable linkers on those artifacts. - Thin the artifacts, as defined below - Run the final part of all shardable linkers. - Emit the "output" and "extra" files. === Development mode === Development mode does not generate any compiled permutations. Thus, it does not run the per-permutation part of linkers. It does, however, need to run the final-link part of linkers. It should do this just after the places it calls link() or relink(). === Detailed API changes === - Linkers that are updated to be shardable are annotated with a new annotation @Shardable - The Linker.link() method has a new boolean parameter, indicating whether it is running on a shard or on the final node. - BinaryEmittedArtifact is added as a final subclass of EmittedArtifact, indicating an artifact with no internal structure. The compiler can bulk copy such artifacts rather than using Java serialization. - There is a new annotation @Transferable that can be added to artifacts. Artifacts without this annotation are subject to thinning, described below. === Thinning of an artifact set === After the sharded part of a linker runs, the resulting artifact set is thinned down, so as to minimize the amount sent back to the Link node and to minimize the amount of deserialization that Link has to do. Thinning an artifact set does two things: - All EmittedArtifacts are replaced by a BinaryEmittedArtifact, thus discarding any fields that the EmittedArtifact might have had. - All other artifacts are discarded, except ones annotated with @Transferable === Order of linkers === Whenever the compiler runs a number of linkers, it runs them in the order implied by the PRE, PRIMARY, and POST annotations. This is true on the shards and not, as well as with both the shardable and non-shardable link() methods. -- http://groups.google.com/group/Google-Web-Toolkit-Contributors