Moin During the cloud summit we had a chat about distribution mirrors used in our images. Currently we have three different approaches in the large public clouds.
Amazon: Uses the CDN provided by Amazon. This CDN is also used as backend for deb.debian.org. Google: Uses httpredir.debian.org. Microsoft: Maintains a network of mirrors in all of their production regions (24 as of now). Because of the problematic state of httpredir and a potential large amount of systems running the same software, we came to the understanding that using some sort of mirror within the infrastructure is a good idea. Google had a mirror within their cloud, but scrapped it because they where unable make it stable enough. I currently maintain the mirror network within Azure and I found that maintaining even 40+ mirrors to be pretty low maintenance. It sometimes produces stray network timeouts, which may be possible to fight with a retry logic in ftpsync on connection timeouts. Primed with the collevtive knowledge we started to think about providing mirrors within Google cloud again. We got the ok from Google to use their Cloud CDN as a public mirror. There is one technical limitation in the implementation left, which needs to be fixed first, but I'm confident they will be able to do that. So I'd like to draft a plan for such a mirror. My plan for implementing this CDN mirror is a follows. The CDN needs to be backed by instances running inside the Google cloud. We will run three mirror pairs in different locations. Two instances in one location will provide availability even if we need to take one offline. Most likely this mirrors will be located in us-central, europe-west and asia-east. Each mirror will host a complete copy of the main and security archive. Disk space is cheap and we want to reduce operational load for maintaining larger sets of mirrors. In contrast to the CDN hosted by Fastly and Amazon we also don't want to use the same backends (ftp.debian.org and security.debian.org) this CDN is supposed to relieve from presure. The mirrors on this backends are not updated at the exactly same time. I'm not yet completely sure how this will interact with the cache within the CDN. This problem exists both within one location and between location. For updates within one location that will be a problem. Requests are load balanced between both instances and the only thing we can do is to implement session stickyness based on client IP. However I assume that using a two stage update should be enough, which updates all mirrors in the set for stage one and then all for stage two. It makes sure all referenced files (packages, byhash-ed files) are already available before any of the mirrors gets the InRelease file. For updates between different locations we should be safe. Different connections of one client should use the same locations unless something fails. But we could think about doing two stage updates throughout the whole network. The mirror network will be updated from the outside via push to a mirror master in the US location and propagate the changes internally. What we may want is a check on boot of any of the internal systems if the local mirror corresponds to the rest of the network. Does anyone see problems with this plan? Regards, Bastian -- Time is fluid ... like a river with currents, eddies, backwash. -- Spock, "The City on the Edge of Forever", stardate 3134.0