[gwt-contrib] RFC: sharded linking

Lex Spoon Tue, 09 Feb 2010 14:31:25 -0800

This is a design doc about speeding up the link phase of GWT.  If you don't
maintain a linker, and if you don't have a multi-machine GWT build, then
none of this should matter to you.  If you do maintain a linker, let's make
sure your linker can be updated with the proposed changes.  If you do have a
multi-machine build, or if you have some ideas about them, then perhaps you
can help us get the best speed benefit possible out of this.


I want to speed up linking for multi-machine builds in two ways:

1. Allow more parts of linking to run in parallel.  In particular, anything
that happens once per permutation and does not need information from other
permutations can run in parallel.  As an example, the iframe linker chunks
the JavaScript of each permutation into multiple <script> tags.  That work
can happen in parallel once the linker API supports it.

2. Link does a lot of Java serialization for its artifacts, but the majority
of the artifacts in a compile are emitted artifacts that have no structure.
 They are just a named bag of bits, from the compiler's perspective.  It
would help if such artifacts did not need a round of Java serialization on
the Link node and could instead be bulk copied.


=== Transition ===

The compiler will support two compilation modes: maximal sharding and
simulated sharding.  Maximal sharding is used when all linkers support it
and the Precompile/CompilePerms/Link entry points are used.  Simulated
sharding is used when either some linker can't shard or when the Compiler
entry point is used.

Linkers individually indicate whether they implement the sharding or
non-sharding API. This allows linkers to be updated one by one and to leave
the non-sharding API behind once they do. It does not cause trouble with
other linkers, because in practice linkers are highly independent.  I've
looked at as many linkers as I could find to verify this.  Occasionally one
linker depends on another; in such a case they'll have to be updated in
tandem, but the need for that should be rare.

By default, a linker is assumed to want the legacy non-sharding API. For
such linkers, it isn't safe to assume it generators or its associated
artifacts can be safely serialized and then deserialized on a different
computer.

The non-sharding API will be deprecated.  After the sharding API has been
out for one GWT release cycle, support for non-shardable linkers will be
dropped.


=== Maximal sharding ===

Currently, Precompile parses Java into ASTs and runs
generators. CompilePerms then runs one copy for each permutation, in
parallel. Each instance optimizes the AST for one permutation and then
converts it into JavaScript plus some additional artifacts. Finally, Link
takes the JavaScript and all the produced artifacts, runs the individual
linkers, and produces the final output. In summary, the three stages are:

current Precompile:

   - parse Java and run generators
   - output: number of permutations, AST, generated artifacts

current CompilePerms:

   - input: permutation id, AST
   - compile one permutation to JavaScript
   - output: JavaScript, generated artifacts

current Link:

   - input: JavaScript from all permutations, generated artifacts
   - run linkers on all artifacts
   - emit EmittedArtifacts into the final output

With maximal sharding, Precompile does no work except to count the number of
permutations. Each CompilePerms instance parses Java ASTs, run generators,
and optimizes for a specific permutation. Additionally,
each CompilePerms instance also runs the shardable part of linkers on the
results for that permutation. It then "thins" the artifacts (see below) and
emits them. Finally, Link takes these results from the CompilePerms
instances, runs the final, non-shardable part of each linker, and emits all
the artifacts designated as emitted artifacts.  In summary, the
maximal-sharding staging looks like this:

new Precompile:

   - output: number of permutations

new CompilePerms:

   - input: permutation id
   - compile one permutation to JavaScript, including running generators
   - run the on-shard part of linkers
   - thin down the resulting artifacts, as defined below
   - output: JavaScript and the thinned down set of artifacts

new Link:

   - input: JavaScript and transferable artifacts from each permutation
   - run the final part of linkers, which can add more files to the final
   output
   - output: resulting emitted artifacts


=== Simulated Sharding ===

Simulated sharding uses the in-trunk compiler staging, but runs the linkers
as much as possible as if they were using the maximal sharding staging. The
sequence is the same whether the Compiler entry point is used or the
Precompile/CompilePerms/Link trio of entry points is used. Under
simulated sharding, the Precompile and CompilePerms steps run exactly as in
trunk. The Link stage, however, runs the linkers in a careful order so as to
use the sharded API for those linkers that have been updated:

   - For each compiled permutation, run the on-shard part of
   all shardable linkers. For each permutation, start with a fresh set of
   artifacts so that the linkers don't see each other's output.
   - Combine all of the resulting artifacts.
   - Run the non-shardable linkers on those artifacts.
   - Thin the artifacts, as defined below
   - Run the final part of all shardable linkers.
   - Emit the "output" and "extra" files.


=== Development mode ===

Development mode does not generate any compiled permutations. Thus, it does
not run the per-permutation part of linkers. It does, however, need to run
the final-link part of linkers. It should do this just after the places it
calls link() or relink().


=== Detailed API changes ===


   - Linkers that are updated to be shardable are annotated with a new
   annotation @Shardable
   - The Linker.link() method has a new boolean parameter, indicating
   whether it is running on a shard or on the final node.
   - BinaryEmittedArtifact is added as a final subclass of EmittedArtifact,
   indicating an artifact with no internal structure.  The compiler can bulk
   copy such artifacts rather than using Java serialization.
   - There is a new annotation @Transferable that can be added to artifacts.
    Artifacts without this annotation are subject to thinning, described below.


=== Thinning of an artifact set ===

After the sharded part of a linker runs, the resulting artifact set is
thinned down, so as to minimize the amount sent back to the Link node and to
minimize the amount of deserialization that Link has to do. Thinning an
artifact set does two things:

   - All EmittedArtifacts are replaced by a BinaryEmittedArtifact, thus
   discarding any fields that the EmittedArtifact might have had.
   - All other artifacts are discarded, except ones annotated with
   @Transferable


=== Order of linkers ===

Whenever the compiler runs a number of linkers, it runs them in the order
implied by the PRE, PRIMARY, and POST annotations.  This is true on the
shards and not, as well as with both the shardable and non-shardable link()
methods.

-- 
http://groups.google.com/group/Google-Web-Toolkit-Contributors

[gwt-contrib] RFC: sharded linking

Reply via email to