On 08/10/2010, at 5:18 AM, Steve Appling wrote: > > On Oct 6, 2010, at 4:06 PM, Steve Appling wrote: > >> As part of trying to optimize my build, I have been trying to figure out how >> much time was spent by the up-to-date checks of inputs/outputs for each >> task. I just realized that tasks which are actually executed, but which set >> the didWork flag to false are marked as UP-TO-DATE. I can't distinguish >> between tasks where execution was skipped and tasks which are executed, but >> didn't end up doing anything. >> >> Should tasks that do no work be marked as skipped and up-to-date? They >> aren't really skipped, should they really be marked that way? If they are >> marked as skipped, can we use another skip message to distinguish the two >> cases? >> >> AbstractTask.setDidWork was never promoted to the Task interface. I think >> perhaps it should be. I'm setting this now from some DefaultTasks, but >> hadn't realized that this is not really public. >> >> If I can distinguish better between input/output checking and tasks that did >> no work, I might have some better numbers, but it looks to me like a good >> portion of my build is spent doing the input/output up-to-date checks. This >> is particularly expensive for tasks whose inputs are configurations that >> contain large numbers of jars. Hashing the contents of large files like >> this is expensive. I think it would be nice to be able to choose another >> strategy for these checks (like comparing modification times and file size). >> I understand that when working with certain version control systems, >> modification times may be unreliable, but for people not using those systems >> a hash seems overly slow. I may try this on my build to see how much it >> improves. > > I'm having deja vu now. I keep wanting to blame the file hashing for the > slowness of the up-to-date checking, but after using a real profiler I have > discovered (once again) that this is not the problem. Creating a new > FileSnapshot class that only uses the file size and last modified time only > trimmed a couple of seconds off the build. While this might end up being > useful, it is not the primary culprit. Most of the time is spent inside the > depths of AbstractFileCollection.iterator. Out of a total of 20 seconds > spent in AbstractTask.execute, 12 were in AbstractFileCollection.iterator. > It's finding the files out of a configuration that is slow, not caching the > state. Almost all of this time is in DefaultConfiguration.getFiles . > > I think the problem is inherent in how Ivy is resolving transitive > dependencies. Resolving the transitive dependencies of configurations in a > big multi-project build ends up re-evaluating the same dependent > configurations over and over again. In my case 61 calls to > DefaultConfiguration.getFiles turned into 27,186 calls to ivy's > ResolveEngine.fetchDependencies. Could we lock down changes to the > configurations during the execution phase and cache the results in some way > so each configuration is only really resolved once? I think this would mean > not using Ivy for configurations, though.
I suspect there's some good opportunities for caching in there somewhere. I don't know the internals of ivy well enough to suggest what that might be. There are some issues with reusing a resolved configuration in another (whether inherited or as a dependency), but I'm sure we could come up with something to solve this: * There may be additional dependency conflicts. * There may be additional exclude rules. -- Adam Murdoch Gradle Developer http://www.gradle.org CTO, Gradle Inc. - Gradle Training, Support, Consulting http://www.gradle.biz
