On Oct 6, 2010, at 4:06 PM, Steve Appling wrote:

> As part of trying to optimize my build, I have been trying to figure out how 
> much time was spent by the up-to-date checks of inputs/outputs for each task. 
>  I just realized that tasks which are actually executed, but which set the 
> didWork flag to false are marked as UP-TO-DATE.  I can't distinguish between 
> tasks where execution was skipped and tasks which are executed, but didn't 
> end up doing anything.
> 
> Should tasks that do no work be marked as skipped and up-to-date?  They 
> aren't really skipped, should they really be marked that way?  If they are 
> marked as skipped, can we use another skip message to distinguish the two 
> cases?
> 
> AbstractTask.setDidWork was never promoted to the Task interface.  I think 
> perhaps it should be.  I'm setting this now from some DefaultTasks, but 
> hadn't realized that this is not really public.
> 
> If I can distinguish better between input/output checking and tasks that did 
> no work, I might have some better numbers, but it looks to me like a good 
> portion of my build is spent doing the input/output up-to-date checks.  This 
> is particularly expensive for tasks whose inputs are configurations that 
> contain large numbers of jars.  Hashing the contents of large files like this 
> is expensive.  I think it would be nice to be able to choose another strategy 
> for these checks (like comparing modification times and file size).  I 
> understand that when working with certain version control systems, 
> modification times may be unreliable, but for people not using those systems 
> a hash seems overly slow.  I may try this on my build to see how much it 
> improves.

I'm having deja vu now.  I keep wanting to blame the file hashing for the 
slowness of the up-to-date checking, but after using a real profiler I have 
discovered (once again) that this is not the problem.  Creating a new 
FileSnapshot class that only uses the file size and last modified time only 
trimmed a couple of seconds off the build.  While this might end up being 
useful, it is not the primary culprit.  Most of the time is spent inside the 
depths of AbstractFileCollection.iterator.  Out of a total of 20 seconds spent 
in AbstractTask.execute, 12 were in AbstractFileCollection.iterator.  It's 
finding the files out of a configuration that is slow, not caching the state.  
Almost all of this time is in DefaultConfiguration.getFiles .  

I think the problem is inherent in how Ivy is resolving transitive 
dependencies.  Resolving the transitive dependencies of configurations in a big 
multi-project build ends up re-evaluating the same dependent configurations 
over and over again.  In my case 61 calls to DefaultConfiguration.getFiles 
turned into 27,186 calls to ivy's ResolveEngine.fetchDependencies.  Could we 
lock down changes to the configurations during the execution phase and cache 
the results in some way so each configuration is only really resolved once?  I 
think this would mean not using Ivy for configurations, though.

--
Steve Appling
Automated Logic Research Team





---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email


Reply via email to