Sorry for the cross post but this seems relevant to both these groups.
I was thinking about the subject of mirroring and redirection for the
ASF Repository. Currently, there was some discussion on the Depot list
concerning this. I feel we could address this subject again for both
groups interest.
www.apache.org/dyn/closer cgi provides a simple resolution strategy to
attempt to determine the closest mirror available to the client browser.
It then generates an html page via a template that lists the selected
mirror as well as other available mirrors. With Depot, we have a
customized download client that could be extended to manage downloading
from a list of mirrors as well.
Here are my thoughts on this subject:
A.) This script is really not that big (90% of it is just parsing the
mirrors file), and the database (a flat text file called mirrors.list)
as well is not very big. While closer.cgi is a neat service for
browsers. Its not exactly helpful for automated clients. Yet,
mirrors.list is an excellent example of metadata that is exposed in a
effective manner such that automated clients can access it.
http://www.apache.org/mirrors/mirrors.list
I'm somewhat convinced that a it would be simple to create a client
implementation which accomplished the same functionality as closer.cgi
programatically so that it could be used in terms of resolving a
location to download from when mirrors are available.
This would be beneficial to the Apache Bandwidth issue in that if a
client such as Depot/DownloadManager managed the same capability as
closer.cgi then:
1.) to determine if the list file has been updated, all one needs to do
is a head request on the file and review the lastModified date,
downloading it if it is newer than the client local copy.
2.) Apache server cpu time is spent parsing this file for each
"closer.cgi" request on the server side, instead the client spends the
cpu time doing this calculation. After the intial head request to check
when the mirror list was last updated, no other requests occur to
www.apache.org in the download process.
B.) Downfalls?
1.) If such a service were server-side, we do get a centralized way of
managing it.
But its difficult to control http client behavior from the server
outside of the most simplistic of "http redirects", the cost of
downloading a file becomes much greater in that each download request
has to be redirected through closer.cgi.
2.) Statistics: I guess the benefit that I do see is that one could log
requests through closer.cgi to track download statistics.
But these again would only be "partial stats" because any browser can
simply bookmark a mirror and go to it directly. It seems more
appropriate that a "download stats" tool would operate more behind the
scenes of all the mirrors and be aggrigated across all the mirrors to
gain more accuracy in such statistics.
Cheers,
-Mark
begin:vcard
fn:Mark Diggory
n:Diggory;Mark
org:Harvard University;Harvard MIT Data Center
adr:Harvard University;;G-6 Littauer Center (North Yard);Cambridge;Ma;02138-2901;United States
email;internet:[EMAIL PROTECTED]
title:Software Engineer
tel;work:617 496 7246
tel;fax:617 495 0438
tel;home:617 718 2033
tel;cell:617 285 4106
url:http://www.hmdc.harvard.edu
version:2.1
end:vcard