On Wed, Sep 28, 2011 at 7:03 AM, Adam Murdoch
<[email protected]>wrote:
>
> On 26/09/2011, at 8:22 PM, Luke Daley wrote:
>
>
> On 26/09/2011, at 6:34 AM, Adam Murdoch wrote:
>
> When resolving a dependency descriptor:
> * We look for a matching dependency in the cache for each resolver. We
> don't invoke the resolvers at this stage. If a matching cached value is
> found that has not expired, we use the cached value, and stop looking.
> * Otherwise, we attempt to resolve the dependency using each resolver in
> turn. We stop on the first resolver that can resolve the dependency.
> * If this fails, and an expired entry was found in the cache, use that
> instead.
> * We remember which resolver found the module, for downloading the
> artifacts later.
>
> When resolving an artifact, we delegate directly to the resolver where the
> artifact's module was found.
>
> Also, we apply the same expiry time for all dynamic revisions. This
> includes snapshot revisions, changing = true, anything that matches a
> changing pattern, version ranges, dynamic revisions (1.2+, etc) and statuses
> (latest.integration). When we resolve a dynamic dependency descriptor, we
> persist the module that we ended up resolving to, and we use that value
> until the expiry is reached.
>
> Some implications of this:
>
> * We're making a performance-accuracy trade-off here. Which means we'll
> probably need some way to tweak the behaviour. Not sure exactly how this
> might look, yet. I'd start with a simple time-to-live property on each
> configuration, and let the use cases drive anything beyond that.
>
> * For dynamic revisions, stopping on the first resolver means we may miss a
> newer revision that happens to be in a later repository. An alternate
> approach might be to use all resolvers for dynamic revisions, but only when
> there is no unexpired value in the cache. We could do this search in
> parallel, and just pick the latest out of those we have found at the end of
> some timeout. Perhaps we could do the search in parallel for all revisions,
> dynamic or not.
>
>
> If that timeout as expired, we should try all resolvers I think.
>
> If parallel is achievable in our timeframes then I can't see a reason not
> to.
>
>
> Ivy uses a nice sprinkling of static state, which means it's not going to
> be easy to do parallel resolution. There are a couple of options, however.
>
> One option is to offer parallel resolution for those resolver
> implementations that we provide, as we can make sure they don't use any
> static state. That is, for file and http ivy and maven repositories that you
> define via repositories.maven(), ivy(), mavenCentral() and mavenLocal(),
> plus whatever we add in the future, we could potentially do parallel
> resolution.
>
> This is a potentially a lot of work, which helps only those builds that
> need to use multiple remote repositories. So, the other option is a more
> general solution, which helps all builds:
>
> We could start doing dependency resolution parallel to executing tasks, so
> that, for example, while the unit tests for gradle core are executing, we
> can be resolving and downloading the compile classpath for gradle launcher.
> We'd still do dependency resolution in a single thread. It would just be a
> separate thread to that executing the tasks.
>
> There are a few ways to approach this. I think about this as making the DAG
> finer grained. Currently, each node in the DAG is really made up of a few
> separate pieces of work: resolving any external artifacts that make up the
> input files, checking whether the outputs of the task are up-to-date wrt its
> inputs, and finally executing the actions of the task. I think we should
> bust the task nodes up into separate nodes, each with their own
> dependencies. Here's an example:
>
> task testCompile(type: CompileJava) {
> classpath = configurations.testCompile + sourceSets.main.output
> }
>
> execute testCompile -> build the inputs of testCompile
> build the inputs of testCompile -> build configurations.testCompile, build
> sourceSets.main.output
> build configurations.testCompile -> the jar tasks from any project
> dependencies in configurations.testCompile
> build sourceSets.main.output -> the compileJava task
>
> A node would be available for execution as soon as all its dependencies
> have been executed. One thread would execute available task nodes, the other
> would "execute" available file collection nodes, that is, resolve external
> dependencies. So, in our example above, once the jars for our project
> dependencies have been built, we can execute the compileJava task and
> resolve configurations.testCompile in parallel.
>
> What is interesting about this approach is that it is a nice step towards
> parallel task execution. There are a few technical hurdles we need to tackle
> before we can go fully parallel for tasks (there will be others, of course):
> * Dependency resolution is not thread safe
> * Incremental build does not work across multiple processes
> * Our progress reporting does not understand multiple things happening at
> the same time
> * Same for profiling
> * We need to be able to re-order task execution, but we don't know which
> tasks we can safely shuffle around
>
> By splitting out dependency resolution and up-to-date checking into
> separate DAG nodes that are executed by a separate thread, we can defer
> solving the first 2 issues, but still do things in parallel. Doing things in
> parallel will force us to start tackling the last 3 issues. Later, we can
> add additional task execution threads, or additional worker processes, each
> with a dependency resolution and task execution thread.
>
>
> Also, we could save some time if we can specify that some repositories will
> never have newer snapshot versions. There is no point checking maven central
> for a newer version of the same version number of anything if it is already
> cached. However, this isn't likely to offer much of a saving on small -
> medium projects.
>
>
> This is a good point. Some repositories have constraints on what is
> published there. eg don't go looking for snapshots at all in maven central.
>
>
> Have we considered mapping dependencies to specific repositories? i.e.
> specifying that dependencies can only come from certain repositories would
> certainly make resolution faster for large projects but is less
> convenient. Perhaps it could be optional, with the default behaviour being
> that a dependency will be searched for in each repository. Another way to
> achieve this might be to have include/exclude patterns on repositories that
> are checked before attempting to search it for a particular artifact.
>
>
> This is certainly an option.
>
We will eventually enable this feature in a post-1.0 release. But the
scenario for this would not be performance improvement. It would be rather
explicitness and exactness. For example if there is just one dependency you
retrieve from a special repo it would nice to be explicit about it and not
create the idea that this repository is another general repo. Or you have
some legacy repo that has a lot of crap that conflicts with your other repo
but you still need it for some stuff. You might want to isolate the crap.
Performance wise we want to improve and there is some stuff we should do.
But I wouldn't go crazy here. I mean this is mostly painful right now
because there are unnecessary network lookups for dynamic revisions. Once
this is fixed we are talking about the usually few builds where dynamic
revisions are timed out in the cache and need to be retrieved/rechecked.
Plus you can always use a repository manager to make this very efficient.
Repository managers allow you to create any number of virtual repositories
that are an arbitrary aggregation of multiple physical ones. That way you
only need one and only one repo from a Gradle perspective.
Hans
--
Hans Dockter
Founder, Gradle
http://www.gradle.org, http://twitter.com/gradleware
CEO, Gradleware - Gradle Training, Support, Consulting
http://www.gradleware.com
>
>
> --
> Adam Murdoch
> Gradle Co-founder
> http://www.gradle.org
> VP of Engineering, Gradleware Inc. - Gradle Training, Support, Consulting
> http://www.gradleware.com
>
>