Re: [gwt-contrib] Re: RFC: sharded linking

Scott Blum Thu, 18 Feb 2010 16:10:34 -0800

On Wed, Feb 17, 2010 at 6:17 PM, Lex Spoon <sp...@google.com> wrote:

> On Tue, Feb 16, 2010 at 3:32 PM, Scott Blum <sco...@google.com> wrote:
>
>> On Fri, Feb 12, 2010 at 7:00 PM, Lex Spoon <sp...@google.com> wrote:
>>
>>> On Fri, Feb 12, 2010 at 9:50 AM, Alex Moffat <alex.mof...@gmail.com>wrote:
>>>
>>>> Where can I read a description of what -XshardPrecompile, or see the
>>>> code for it, it sounds very useful to me personally?
>>>
>>>
>>> -XshardPrecompile is an experiment that everyone wants to change, so it
>>> seems unlikely to be released in its current form.  We can talk about it if
>>> it helps, but I would propose that we focus more on what we want to do for
>>> real.
>>>
>>
>> It seemed relevant because it sounded like you propose to essentially make
>> -XshardPrecompile the default (only?) behavior for Precompile?  Or did I
>> misread?
>>
>
> No, that's the idea.
>


Okay.  I think making precompiles sharded is maybe what makes some folks
nervous, because there's no way to avoid sending lots of "stuff" over the
wire, as opposed to the current configuration which only needs to send over
the AST for optimizations.  Also, while sharded precompiles take less wall
time to get the whole thing done (provided you have plenty of machine
resources), it takes more total CPU-hours, and could actually take longer if
machine resources are scarce, which could be a concern for some.

I'm not (at this time) making a case for or against making precompiles
always sharded.  Rather, I'm whether that argument can, or cannot, be
separated from sharded linking.  I'm not clear on whether sharded linking
inherently requires sharded precompiles.


>
>
>> The reason that makes me cautious has to do with a desire for a future
>> change to the Generator API to support things like minimal rebuild.  I
>> imagine a world where the work each Generator does could be sharded out in a
>> way that's independent of the number of permutations.
>>
>
> Are you saying that you want to not have to shard, with future
> developments?  I don't think that should be a problem with this patch.  As a
> case in point, the Compiler entry point *could* shard out generating and
> linking, but it chooses not to.  We have the flexibility to play around with
> these choices over time.
>

Ok, good point.


> Everyone is happy, I think, with having dev mode run a single on-shard
> linking step.  So, these are just details.  FWIW, here is how it is in the
> patch:
>
> 1. Resources are available via ResourceOracle.
> 2. Public artifacts are be there.  They are identical on all permutations,
> so they aren't added to the artifact set until the final link step.
> 3. Generated artifacts are there for compilation, but not for development
> mode.  With development mode, all linking is done before the generators run,
> and generators run on demand.
>

In the current design, linkers get run again (via relink()), whenever
additional generated resources are created.  That gives any linkers which
have connections to generators a chance to run again any time new generated
resources become available.  Are you saying it will no longer work this way?


> ----------- you write (gmail just messed up my reply quotes): ----
>
>  Now that I am thinking along those lines, it almost begs the question.  If
> we are willing to break the world, is this the best possible way to model
> new link process?  In other words, it seems worth re-examining the design
> without regard to the existing API and asking ourselves if it's the thing
> we'd have designed from scratch.  Maybe you guys all already did that and
> I'm the only one late to the party.
>
> For example, if we're going from scratch, then we could avoid the
> transition entirely and just mandate what the new rules are.  We wouldn't
> need a @Shardable annotation since all linkers would need to be sharding
> aware.  We might rather have two separate methods for sharded vs.
> non-sharded link than a boolean parameter.  We might revisit the whole PRE,
> PRIMARY, POST thing with regards to sharding and decide the right answer is
> SHARD, PRE, PRIMARY, POST.  Or something.  I don't know what the right
> answers are.  All I'm saying is, breaking things is awesome when you're
> doing something revolutionary and the end result is awesome.  I just want to
> be sure, if we're going to break things, that we believe we'll end up
> somewhere revolutionary and awesome as opposed to evolutionary and
> incremental, but less than awesome.
>
> --------------------------------------------------------------------------------
>
>
> I initially proposed simply breaking the world.  However, at your
> encouragement, this patch has developed to be backwards compatible.  As
> things stand, this patch both gets a large improvement and is evolutionary.
>

Okay, I am convinced now that this change is more evolutionary than it
sounded like from the high level description.  (For example, it sounded like
link() was actually getting an extra parameter-- a breaking change-- but
when I actually looked at the patch, I saw it was a new overload.)


> On those specific changes:
>
> 1. @Shardable can certainly be dropped after a deprecation period.  Is
> there any urgency to drop it immediately?
>
> 2. Two separate methods versus one with a boolean looks fine to me.  It's
> changed back and forth as the patch developed.
>

Ok, that all sounds good.


> 3.PRE/PRIMARY/POST still appear to be useful.  All linkers care whether
> they are primary or not, because there is one primary linker and it must
> deal with generating a selection script.  Additionally, a few linkers care
> whether they go before or after the primary linker.
>
> 4. SHARD as a separate linker order is very tempting but turns out to have
> some problems. First, many linkers have both an on-shard and on-final part,
> and if SHARD was a separate order then those linkers would have to be
> subdivided into two linkers.  Instead of IframeLinker, we'd have to have
> IframeShardLinker and IframeFinalLinker.  Second, the SHARD part also has
> PRE/PRIMARY/POST, so you really have six linker orders, not four.  It's
> tidier to represent the six as two times three.
>

Yeah, the two times three thing... I totally get it, and it makes sense at
one level, definitely.  Certainly having to have "paired" linkers that run
in different phases seems plain bad.  At the same time, thinking about it in
terms of PRE-shard, PRIMARY-shard, POST-shard, PRE-final, PRIMARY-final,
POST-final seems kind of... unpleasant.  It might be totally the right way
to go, I just wish it were less ugly. :)

But I can't think of anything less-ugly that doesn't run into some problems,
either.  The best idea I had was that maybe @LinkerOrder could optionally
take an array of stages in which to run the same linker.  So the
IFrameLinker could have @LinkerOrder(SHARD, PRIMARY) whereas something like
"gzip static resources" could be @LinkerOrder(SHARD, PRE).  I still kind of
like this idea in general, but I can't figure out a good solution to the
question: how do determine the appropriate order for linkers *within* a pass
(particularly SHARD pass) that were originally ordered via PRE, PRIMARY,
POST?  And this approach also particularly gets in the way of the original
intent of using lexiical order within the GWT XML to determine what order to
run the linkers -- and the order is actually defined to have stack-like
behavior ala "earliest definitions run closest to PRIMARY" rather than
"earliest first" or "earliest last".

I dunno, if that question could be answered, I think that approach appeals
to me more than 3 times 2.  But I have to admit that 3 times 2 seems like
the best approach we've come up with so far, even if it's a little ugly.

Scott

-- 
http://groups.google.com/group/Google-Web-Toolkit-Contributors

Re: [gwt-contrib] Re: RFC: sharded linking

Reply via email to