Hi Adam,

thanks for pointing this out.

I would like to add that there is often a common work-around to the
performance problem. Use a repository manager which will provide you with a
single (virtual) repository. Not that we shouldn't solve this performance
problems. But many builds won't suffer from the performance problem because
they use a single resolver.

I really like your caching strategy and it will be great to see a consistent
and easy-to-use TTL and offline policy mechanism.

Hans

--
Hans Dockter
Founder, Gradle
http://www.gradle.org, http://twitter.com/gradleware
CEO, Gradleware - Gradle Training, Support, Consulting
http://www.gradleware.com


On Mon, Sep 26, 2011 at 7:34 AM, Adam Murdoch
<[email protected]>wrote:

> Hi,
>
> Currently, we use Ivy's ChainResolver to do 2 things:
> * Resolve a dependency descriptor (org, module, revision, config
> constraints) to a module descriptor (more or less an in-memory ivy.xml).
> * Resolve an artifact descriptor (org, module, revision, name, ext, etc) to
> a file.
>
> To resolve a dependency descriptor, ChainResolver iterates over its
> resolvers, asking each resolver to resolve the dependency descriptor. It
> does not stop if a one of the resolvers happens to find the dependency and
> always continues to the end of the chain. Typically, the resolver will look
> first in its cache, and if not present, will hit the target repository (eg
> issue an HTTP GET request or whatever).
>
> This used to work fine performance-wise when we were using the default Ivy
> cache, as each resolver shared the same cached meta-data, and once any
> resolver resolved the dependency, then every resolver would find the
> dependency in the cache from then on. Now that we cache the meta-data
> per-resolver, only those resolvers that actually resolve the dependency will
> later find them in the cache.
>
> So, for example, say we have 2 remote repositories maven central and
> repo.gradle.org, and a module published only to repo.gradle.org and not
> maven central, say, gradle-tooling-api. Every time we try to resolve
> gradle-tooling-api, the ChainResolver asks maven central resolver to resolve
> it. The maven central resolver doesn't find it in the cache, and hits maven
> central to find it. The ChainResolve then asks repo.gradle.org resolver,
> which finds it in the cache and returns it. Net result is that we hit maven
> central at least once per build looking for a module that we've already
> found in repo.gradle.org.
>
> ChainResolver does the same thing when resolving an artifact descriptor:
> Ask each resolver in turn. Each resolver looks in it's cache, and hits the
> repository if not found. And suffers from the same problem now that we cache
> artifacts per resolver. In particular, this makes the IDE tasks really,
> really slow, as we ask each resolver in turn whether it has the -sources.jar
> and the -src.jar and so on.
>
> I'd like to replace this with the following:
>
> When resolving a dependency descriptor:
> * We look for a matching dependency in the cache for each resolver. We
> don't invoke the resolvers at this stage. If a matching cached value is
> found that has not expired, we use the cached value, and stop looking.
> * Otherwise, we attempt to resolve the dependency using each resolver in
> turn. We stop on the first resolver that can resolve the dependency.
> * If this fails, and an expired entry was found in the cache, use that
> instead.
> * We remember which resolver found the module, for downloading the
> artifacts later.
>
> When resolving an artifact, we delegate directly to the resolver where the
> artifact's module was found.
>
> Also, we apply the same expiry time for all dynamic revisions. This
> includes snapshot revisions, changing = true, anything that matches a
> changing pattern, version ranges, dynamic revisions (1.2+, etc) and statuses
> (latest.integration). When we resolve a dynamic dependency descriptor, we
> persist the module that we ended up resolving to, and we use that value
> until the expiry is reached.
>
> Some implications of this:
>
> * We're making a performance-accuracy trade-off here. Which means we'll
> probably need some way to tweak the behaviour. Not sure exactly how this
> might look, yet. I'd start with a simple time-to-live property on each
> configuration, and let the use cases drive anything beyond that.
>
> * For dynamic revisions, stopping on the first resolver means we may miss a
> newer revision that happens to be in a later repository. An alternate
> approach might be to use all resolvers for dynamic revisions, but only when
> there is no unexpired value in the cache. We could do this search in
> parallel, and just pick the latest out of those we have found at the end of
> some timeout. Perhaps we could do the search in parallel for all revisions,
> dynamic or not.
>
> * We fetch artifacts only from the same repository as their module was
> found in (but this repository can have multiple patterns/base urls/etc). I
> think this is a good thing, from an accuracy/repeatability point of view.
>
> * The fact that all dynamic revisions have the same time-to-live is a
> change in behaviour, but a good one, I think.
>
> * Offline mode becomes cheap to implement. We just skip asking the
> resolvers. Plus, it makes offline mode less important, because we make more
> effort to use whatever is cached.
>
> * Caching moves out of the resolvers and becomes a decoration that we apply
> consistently. This means less effort to implement a resolver in the future.
>
> Thoughts? I want to get on to this as soon as milestone 5 is out.
>
>
> --
> Adam Murdoch
> Gradle Co-founder
> http://www.gradle.org
> VP of Engineering, Gradleware Inc. - Gradle Training, Support, Consulting
> http://www.gradleware.com
>
>

Reply via email to