Hi Adam, thanks for pointing this out.
I would like to add that there is often a common work-around to the performance problem. Use a repository manager which will provide you with a single (virtual) repository. Not that we shouldn't solve this performance problems. But many builds won't suffer from the performance problem because they use a single resolver. I really like your caching strategy and it will be great to see a consistent and easy-to-use TTL and offline policy mechanism. Hans -- Hans Dockter Founder, Gradle http://www.gradle.org, http://twitter.com/gradleware CEO, Gradleware - Gradle Training, Support, Consulting http://www.gradleware.com On Mon, Sep 26, 2011 at 7:34 AM, Adam Murdoch <[email protected]>wrote: > Hi, > > Currently, we use Ivy's ChainResolver to do 2 things: > * Resolve a dependency descriptor (org, module, revision, config > constraints) to a module descriptor (more or less an in-memory ivy.xml). > * Resolve an artifact descriptor (org, module, revision, name, ext, etc) to > a file. > > To resolve a dependency descriptor, ChainResolver iterates over its > resolvers, asking each resolver to resolve the dependency descriptor. It > does not stop if a one of the resolvers happens to find the dependency and > always continues to the end of the chain. Typically, the resolver will look > first in its cache, and if not present, will hit the target repository (eg > issue an HTTP GET request or whatever). > > This used to work fine performance-wise when we were using the default Ivy > cache, as each resolver shared the same cached meta-data, and once any > resolver resolved the dependency, then every resolver would find the > dependency in the cache from then on. Now that we cache the meta-data > per-resolver, only those resolvers that actually resolve the dependency will > later find them in the cache. > > So, for example, say we have 2 remote repositories maven central and > repo.gradle.org, and a module published only to repo.gradle.org and not > maven central, say, gradle-tooling-api. Every time we try to resolve > gradle-tooling-api, the ChainResolver asks maven central resolver to resolve > it. The maven central resolver doesn't find it in the cache, and hits maven > central to find it. The ChainResolve then asks repo.gradle.org resolver, > which finds it in the cache and returns it. Net result is that we hit maven > central at least once per build looking for a module that we've already > found in repo.gradle.org. > > ChainResolver does the same thing when resolving an artifact descriptor: > Ask each resolver in turn. Each resolver looks in it's cache, and hits the > repository if not found. And suffers from the same problem now that we cache > artifacts per resolver. In particular, this makes the IDE tasks really, > really slow, as we ask each resolver in turn whether it has the -sources.jar > and the -src.jar and so on. > > I'd like to replace this with the following: > > When resolving a dependency descriptor: > * We look for a matching dependency in the cache for each resolver. We > don't invoke the resolvers at this stage. If a matching cached value is > found that has not expired, we use the cached value, and stop looking. > * Otherwise, we attempt to resolve the dependency using each resolver in > turn. We stop on the first resolver that can resolve the dependency. > * If this fails, and an expired entry was found in the cache, use that > instead. > * We remember which resolver found the module, for downloading the > artifacts later. > > When resolving an artifact, we delegate directly to the resolver where the > artifact's module was found. > > Also, we apply the same expiry time for all dynamic revisions. This > includes snapshot revisions, changing = true, anything that matches a > changing pattern, version ranges, dynamic revisions (1.2+, etc) and statuses > (latest.integration). When we resolve a dynamic dependency descriptor, we > persist the module that we ended up resolving to, and we use that value > until the expiry is reached. > > Some implications of this: > > * We're making a performance-accuracy trade-off here. Which means we'll > probably need some way to tweak the behaviour. Not sure exactly how this > might look, yet. I'd start with a simple time-to-live property on each > configuration, and let the use cases drive anything beyond that. > > * For dynamic revisions, stopping on the first resolver means we may miss a > newer revision that happens to be in a later repository. An alternate > approach might be to use all resolvers for dynamic revisions, but only when > there is no unexpired value in the cache. We could do this search in > parallel, and just pick the latest out of those we have found at the end of > some timeout. Perhaps we could do the search in parallel for all revisions, > dynamic or not. > > * We fetch artifacts only from the same repository as their module was > found in (but this repository can have multiple patterns/base urls/etc). I > think this is a good thing, from an accuracy/repeatability point of view. > > * The fact that all dynamic revisions have the same time-to-live is a > change in behaviour, but a good one, I think. > > * Offline mode becomes cheap to implement. We just skip asking the > resolvers. Plus, it makes offline mode less important, because we make more > effort to use whatever is cached. > > * Caching moves out of the resolvers and becomes a decoration that we apply > consistently. This means less effort to implement a resolver in the future. > > Thoughts? I want to get on to this as soon as milestone 5 is out. > > > -- > Adam Murdoch > Gradle Co-founder > http://www.gradle.org > VP of Engineering, Gradleware Inc. - Gradle Training, Support, Consulting > http://www.gradleware.com > >
