Sorry for the cross post but this seems relevant to both these groups.

I was thinking about the subject of mirroring and redirection for the ASF Repository. Currently, there was some discussion on the Depot list concerning this. I feel we could address this subject again for both groups interest.

www.apache.org/dyn/closer cgi provides a simple resolution strategy to attempt to determine the closest mirror available to the client browser. It then generates an html page via a template that lists the selected mirror as well as other available mirrors. With Depot, we have a customized download client that could be extended to manage downloading from a list of mirrors as well.

Here are my thoughts on this subject:

A.) This script is really not that big (90% of it is just parsing the mirrors file), and the database (a flat text file called mirrors.list) as well is not very big. While closer.cgi is a neat service for browsers. Its not exactly helpful for automated clients. Yet, mirrors.list is an excellent example of metadata that is exposed in a effective manner such that automated clients can access it.

http://www.apache.org/mirrors/mirrors.list

I'm somewhat convinced that a it would be simple to create a client implementation which accomplished the same functionality as closer.cgi programatically so that it could be used in terms of resolving a location to download from when mirrors are available.

This would be beneficial to the Apache Bandwidth issue in that if a client such as Depot/DownloadManager managed the same capability as closer.cgi then:

1.) to determine if the list file has been updated, all one needs to do is a head request on the file and review the lastModified date, downloading it if it is newer than the client local copy.

2.) Apache server cpu time is spent parsing this file for each "closer.cgi" request on the server side, instead the client spends the cpu time doing this calculation. After the intial head request to check when the mirror list was last updated, no other requests occur to www.apache.org in the download process.

B.) Downfalls?

1.) If such a service were server-side, we do get a centralized way of managing it.

But its difficult to control http client behavior from the server outside of the most simplistic of "http redirects", the cost of downloading a file becomes much greater in that each download request has to be redirected through closer.cgi.

2.) Statistics: I guess the benefit that I do see is that one could log requests through closer.cgi to track download statistics.

But these again would only be "partial stats" because any browser can simply bookmark a mirror and go to it directly. It seems more appropriate that a "download stats" tool would operate more behind the scenes of all the mirrors and be aggrigated across all the mirrors to gain more accuracy in such statistics.


Cheers, -Mark

begin:vcard
fn:Mark Diggory
n:Diggory;Mark
org:Harvard University;Harvard MIT Data Center
adr:Harvard University;;G-6 Littauer Center (North Yard);Cambridge;Ma;02138-2901;United States
email;internet:[EMAIL PROTECTED]
title:Software Engineer
tel;work:617 496 7246
tel;fax:617 495 0438
tel;home:617 718 2033 
tel;cell:617 285 4106
url:http://www.hmdc.harvard.edu
version:2.1
end:vcard

Reply via email to