Hi, Currently, we use Ivy's ChainResolver to do 2 things: * Resolve a dependency descriptor (org, module, revision, config constraints) to a module descriptor (more or less an in-memory ivy.xml). * Resolve an artifact descriptor (org, module, revision, name, ext, etc) to a file.
To resolve a dependency descriptor, ChainResolver iterates over its resolvers, asking each resolver to resolve the dependency descriptor. It does not stop if a one of the resolvers happens to find the dependency and always continues to the end of the chain. Typically, the resolver will look first in its cache, and if not present, will hit the target repository (eg issue an HTTP GET request or whatever). This used to work fine performance-wise when we were using the default Ivy cache, as each resolver shared the same cached meta-data, and once any resolver resolved the dependency, then every resolver would find the dependency in the cache from then on. Now that we cache the meta-data per-resolver, only those resolvers that actually resolve the dependency will later find them in the cache. So, for example, say we have 2 remote repositories maven central and repo.gradle.org, and a module published only to repo.gradle.org and not maven central, say, gradle-tooling-api. Every time we try to resolve gradle-tooling-api, the ChainResolver asks maven central resolver to resolve it. The maven central resolver doesn't find it in the cache, and hits maven central to find it. The ChainResolve then asks repo.gradle.org resolver, which finds it in the cache and returns it. Net result is that we hit maven central at least once per build looking for a module that we've already found in repo.gradle.org. ChainResolver does the same thing when resolving an artifact descriptor: Ask each resolver in turn. Each resolver looks in it's cache, and hits the repository if not found. And suffers from the same problem now that we cache artifacts per resolver. In particular, this makes the IDE tasks really, really slow, as we ask each resolver in turn whether it has the -sources.jar and the -src.jar and so on. I'd like to replace this with the following: When resolving a dependency descriptor: * We look for a matching dependency in the cache for each resolver. We don't invoke the resolvers at this stage. If a matching cached value is found that has not expired, we use the cached value, and stop looking. * Otherwise, we attempt to resolve the dependency using each resolver in turn. We stop on the first resolver that can resolve the dependency. * If this fails, and an expired entry was found in the cache, use that instead. * We remember which resolver found the module, for downloading the artifacts later. When resolving an artifact, we delegate directly to the resolver where the artifact's module was found. Also, we apply the same expiry time for all dynamic revisions. This includes snapshot revisions, changing = true, anything that matches a changing pattern, version ranges, dynamic revisions (1.2+, etc) and statuses (latest.integration). When we resolve a dynamic dependency descriptor, we persist the module that we ended up resolving to, and we use that value until the expiry is reached. Some implications of this: * We're making a performance-accuracy trade-off here. Which means we'll probably need some way to tweak the behaviour. Not sure exactly how this might look, yet. I'd start with a simple time-to-live property on each configuration, and let the use cases drive anything beyond that. * For dynamic revisions, stopping on the first resolver means we may miss a newer revision that happens to be in a later repository. An alternate approach might be to use all resolvers for dynamic revisions, but only when there is no unexpired value in the cache. We could do this search in parallel, and just pick the latest out of those we have found at the end of some timeout. Perhaps we could do the search in parallel for all revisions, dynamic or not. * We fetch artifacts only from the same repository as their module was found in (but this repository can have multiple patterns/base urls/etc). I think this is a good thing, from an accuracy/repeatability point of view. * The fact that all dynamic revisions have the same time-to-live is a change in behaviour, but a good one, I think. * Offline mode becomes cheap to implement. We just skip asking the resolvers. Plus, it makes offline mode less important, because we make more effort to use whatever is cached. * Caching moves out of the resolvers and becomes a decoration that we apply consistently. This means less effort to implement a resolver in the future. Thoughts? I want to get on to this as soon as milestone 5 is out. -- Adam Murdoch Gradle Co-founder http://www.gradle.org VP of Engineering, Gradleware Inc. - Gradle Training, Support, Consulting http://www.gradleware.com
