On 08/10/2010, at 5:18 AM, Steve Appling wrote:

> 
> On Oct 6, 2010, at 4:06 PM, Steve Appling wrote:
> 
>> As part of trying to optimize my build, I have been trying to figure out how 
>> much time was spent by the up-to-date checks of inputs/outputs for each 
>> task.  I just realized that tasks which are actually executed, but which set 
>> the didWork flag to false are marked as UP-TO-DATE.  I can't distinguish 
>> between tasks where execution was skipped and tasks which are executed, but 
>> didn't end up doing anything.
>> 
>> Should tasks that do no work be marked as skipped and up-to-date?  They 
>> aren't really skipped, should they really be marked that way?  If they are 
>> marked as skipped, can we use another skip message to distinguish the two 
>> cases?
>> 
>> AbstractTask.setDidWork was never promoted to the Task interface.  I think 
>> perhaps it should be.  I'm setting this now from some DefaultTasks, but 
>> hadn't realized that this is not really public.
>> 
>> If I can distinguish better between input/output checking and tasks that did 
>> no work, I might have some better numbers, but it looks to me like a good 
>> portion of my build is spent doing the input/output up-to-date checks.  This 
>> is particularly expensive for tasks whose inputs are configurations that 
>> contain large numbers of jars.  Hashing the contents of large files like 
>> this is expensive.  I think it would be nice to be able to choose another 
>> strategy for these checks (like comparing modification times and file size). 
>>  I understand that when working with certain version control systems, 
>> modification times may be unreliable, but for people not using those systems 
>> a hash seems overly slow.  I may try this on my build to see how much it 
>> improves.
> 
> I'm having deja vu now.  I keep wanting to blame the file hashing for the 
> slowness of the up-to-date checking, but after using a real profiler I have 
> discovered (once again) that this is not the problem.  Creating a new 
> FileSnapshot class that only uses the file size and last modified time only 
> trimmed a couple of seconds off the build.  While this might end up being 
> useful, it is not the primary culprit.  Most of the time is spent inside the 
> depths of AbstractFileCollection.iterator.  Out of a total of 20 seconds 
> spent in AbstractTask.execute, 12 were in AbstractFileCollection.iterator.  
> It's finding the files out of a configuration that is slow, not caching the 
> state.  Almost all of this time is in DefaultConfiguration.getFiles .  
> 
> I think the problem is inherent in how Ivy is resolving transitive 
> dependencies.  Resolving the transitive dependencies of configurations in a 
> big multi-project build ends up re-evaluating the same dependent 
> configurations over and over again.  In my case 61 calls to 
> DefaultConfiguration.getFiles turned into 27,186 calls to ivy's 
> ResolveEngine.fetchDependencies.  Could we lock down changes to the 
> configurations during the execution phase and cache the results in some way 
> so each configuration is only really resolved once?  I think this would mean 
> not using Ivy for configurations, though.

I suspect there's some good opportunities for caching in there somewhere. I 
don't know the internals of ivy well enough to suggest what that might be.

There are some issues with reusing a resolved configuration in another (whether 
inherited or as a dependency), but I'm sure we could come up with something to 
solve this:

* There may be additional dependency conflicts.
* There may be additional exclude rules.


--
Adam Murdoch
Gradle Developer
http://www.gradle.org
CTO, Gradle Inc. - Gradle Training, Support, Consulting
http://www.gradle.biz

Reply via email to