I have not done this, and it's been a few years since I researched this, so please correct me where I'm missing something.

There are two approaches to doing the 'distributed content delivery' job

One is to use ANYCAST

This is leveraging a 'bug' in IPv4 and IPv6 where nothing guarantees that an IP address only exists one place on the Internet. Since the Internet works by the people who own the IP address ranges advertising "I know how to get to this range" and dynamically creating routes based on what they here in terms of advertisements from other similar routers, it turns out that you can advertise the same IP address from multiple datacenters. As long as all the datacenters serve the same content (and the routes don't change, see more below) this is completely transparent to users.

The problem with this approach is that routes around the Internet do change, and you can end up with a end-user starting a session with one datacenter, and then the routes change and the next packets in the session going to the other datacenter.

In my mind, this means that the ANYCAST approach only works great for completely stateless things, UDP DNS being a great example. It can work pretty well for short lived connections, TCP DNS queries being a good example.

If you are talking about doing this internally in your company, rather than externally to the Internet, your routes are more stable and you can do a lot more with ANYCAST. There was a presentation a couple of years ago at LISA from some folks at Google talking about how they have a ANYCAST IP block that routes to a server in each major office so that normal network resources (mail, DNS, printing, etc) can have a single IP address throughout the company, but be served by local servers in each office.

ANYCAST has the huge advantage that it deals automatically with datacenters going down and similar major outages. When a site stops advertising it's existance, it disappears from the Internet within a couple of minutes.


The second approach is to use Dynamic DNS

This is setting up your domain to have a very short time-to-live and adding smarts in your DNS server so that it looks at where the DNS request is from before deciding what response to provide to a query. the short time-to-live is so that as clients move around the network they will keep looking up the name again and get the 'best' (closest, least heavily loaded, etc) IP address for them to actually connect to.

The fundamental problem with this is that your DNS server usually doesn't see queries directly from the end-user, you see queries from their ISPs DNS server. If it's a smallish ISP, this is just as good, but if it's something like AOL, you may see queries from everyone in the country arriving from a small set of IP addresses in their central office.

The practical problem with this is that many DNS servers impose a minimum time-to-live on the DNS data, so they may not query you as frequently as you like. (does anyone know what the practical minimum is nowadays on the Internet?

This doesn't require special IP ranges, you can distribute load across any IP addresses that you own.

This requires that your DNS servers be always available (which is a perfect use of ANYCAST replication), otherwise end-users trying to get to your site will need to wait through a DNS timeout as they try to access your primary DNS server before they give up and try your backup DNS servers.


The final approach is to do Application Level redirection.

This is where you have a lightweight 'welcome' page that sends users to either www1, www2, or www3 etc depending on whatever load balancing or location criteria you want to use. This has the advantage that the load balancing is done based on where the end-user really is on the Internet relative to you. (you can get really tricky by doing performance measurements in javascript in a browser)

This has fallen out of favor in recent years. It requires a lot more planning up front, and your applications need to be aware of what's going on. Links that your end-users see contain location specific info, so if they are passed around, people may hit the 'wrong' site (hopefully not one that's down at the time). However it offers the most control over the load balancing and potentially the best performance for the edn user.


A quick note on geolocation.

Where the user is in the physical world matters far less than how they are connected to you. It's very possible for someone to be far closer physically to one datacenter, but far close network wise to a different datacenter. ping time to the user, BGP hops calculations, and things like that really matter more. ANYCAST addresses this automatically by the way that it works, the end-user sends their packets to the datacenter that is closest to them on the network, not matter where in the world that happens to be.

The best availability/load distribution systems probably take advantage of all of the above messages.

David Lang

P.S. dynamic content generation and commercial CDNs don't tend to work very well together, they are optimized for serving static content, which is a very valuable niche, but it doesn't satisfy all uses.



On Wed, 21 Nov 2012, Jeremy Charles wrote:

I do agree with you and the others who have offered comments to the effect of 
it not being worth doing.

In order to illustrate why it's not worthwhile for us to do this, I need to 
investigate how I would go about doing it.  If I say it's not worth doing but I 
have no clue what it involves, my opinion won't hold any water.  I need to show 
that I have an understanding of what it would take *before* I can make a 
compelling argument that it's not worth doing.  (Or maybe I'll be surprised and 
find that there is a reasonable and worthwhile way to do it...)

So far, I have come up with these approaches today:

1)  Buy IPv4 and IPv6 blocks to use for our own CDN and then upgrade my 
Internet feeds and equipment to advertise these from all of the places where we 
host our content.

2)  Use geolocation information to build views in BIND that resolve the 
relevant domain names to the different servers that provide the content, based 
on perceived location.

I feel that I've been able to show why #1 doesn't make financial sense and #2 
involves lots of potential inaccuracy and unintended consequences (hint:  You 
may be surprised as to whether clients in Dubai should be sent to my Wisconsin 
headquarters or my Amsterdam colo).

Are there other approaches that I should also be looking at and evaluating?


-----Original Message-----
From: Tracy Reed [mailto:[email protected]]
Sent: Wednesday, November 21, 2012 1:37 PM
To: Jeremy Charles
Cc: [email protected]
Subject: Re: [lopsa-discuss] Build my own CDN

On Wed, Nov 21, 2012 at 07:57:35AM PST, Jeremy Charles spake thusly:
I’ve been asked to look in to what it would take for my employer to build its
own content delivery network hosted on our own hardware at various physical
locations around the world (all two of them, soon to be four).  The intent is
to host our own content, not anybody else’s.

I highly recommend looking into one of the existing federated CDNs which you
can join instead of your own CDN. They provide the software and handle
accounting etc. and you can then use your spare storage/bandwidth to make some
money utilizing your excess capacity.

http://onapp.com/cdn/ is the one I am familiar with.

_______________________________________________
Discuss mailing list
[email protected]
https://lists.lopsa.org/cgi-bin/mailman/listinfo/discuss
This list provided by the League of Professional System Administrators
 http://lopsa.org/

Reply via email to