thanks Brett for the input. I can confirm that using black and white lists the case is rather rare when all remote-repos are searched sequentially and the artifact is not found in the end. However it is typical for some scenarios e. g. when you enable the source-jars to get downloaded for a project. From 40 deps, maybe 5 will have source-jars available. In that way a simple mvn-goal takes 30 minutes or more.
I mentioned the timeout just to have a maximum value. Of course usually the requests don't run in a timeout (except when the repo is down) - the average response time is maybe 3-4 secs (for our installation). Also it is clear that the first-serve concept conflicts with the existing concept of an (ordered) list of repos that is searched for. Can we not assume that artifacts with a given spec. are identical from whatever repo they come, provided the hash is matching? Btw., this brings up another idea: could the ASF possibly grant "official" certificates for remote-repos? In that way, Archiva could distinguish between trusted and non-trusted repos. For companies, this would be a compelling feature! I (working for insurances and banks) often hear the argument "of boy - they are downloading software from some obscure server from russia". Having the label "Certified Maven Repository" would surely make those noises more silent :-) The ASF could release a rule-set that the Maven-repo must conform to in order to get the "certified" label. Or even better, the ASF could offer a VMware-image that includes all the software ready to run the Maven-repo - including some logic to verify that known artifacts are mirrored correctly. A total control of repos is not possible, of course. But the contract between Archiva and the remote repo could be tightened pretty much. Back to the concurrent requests idea: sending the HEAD request before the actual GET is surely a good idea. Archiva could decide to which repo to send the GET based on the shortest response-time. Anyway, this feature needs more brainstorming... brettporter wrote: > > On 15/10/2009, at 12:06 AM, Marc Lustig wrote: > >> >> Hi all, >> >> we have configured about 25 remote-repos for our public-artifacts >> managed >> repo. >> In certain cases, black and white lists don't help and a request is >> proxied >> to all the 20 remote-repos _sequentially_. Even thou we have >> configured a >> short timeout of 5 secs, this takes 125 secs in case the artifacts >> doesn't >> exist in any remote-repo - per artifact! >> >> So I was wondering if it would make sense to send requests to all of >> the >> remote-repos _concurrently_. >> The first thread that find the artifacts could cause all the other >> threads >> to cancel the http-request. >> The total request time would reduce from 100 secs++ to merely 5 secs. >> Tremendous win or? >> >> Has this been discussed before? > > I think this is a pretty unusual case. I don't quite understand why > you are hitting the timeout limit on the remote repo - if they are up > they should be fast. Also, "first that finds" is different to the > current rule since it's first that appears in the list. I worry that > in this set up you're not entirely sure which repository the artifacts > are meant to be coming from, so maybe it points to another problem. > >> Is there an argument against this strategy? > > Particularly if we turned on streaming of the proxied download to the > client (which is intended) - we couldn't do so if they were pooled > like this, unless we accepted the "first found rule". > > That said, this might speed up requests with a long list of proxies, > even if they are functioning properly. So it might be reasonable as an > optional capability. One thing to consider would be doing a HEAD > request instead of a GET for all the remotes first to select where to > download from, then execute the GET from the desired one. > > - Brett > > > -- View this message in context: http://www.nabble.com/Proposal%3A-concurrent-remote-requests-tp25890731p25904406.html Sent from the archiva-dev mailing list archive at Nabble.com.
