Re: [lopsa-discuss] Build my own CDN

David Lang Wed, 21 Nov 2012 13:48:39 -0800

I have not done this, and it's been a few years since I researched this, soplease correct me where I'm missing something.


There are two approaches to doing the 'distributed content delivery' job


One is to use ANYCAST

This is leveraging a 'bug' in IPv4 and IPv6 where nothing guarantees that anIP address only exists one place on the Internet. Since the Internet works bythe people who own the IP address ranges advertising "I know how to get to thisrange" and dynamically creating routes based on what they here in terms ofadvertisements from other similar routers, it turns out that you can advertisethe same IP address from multiple datacenters. As long as all the datacentersserve the same content (and the routes don't change, see more below) this iscompletely transparent to users.

The problem with this approach is that routes around the Internet do change,and you can end up with a end-user starting a session with one datacenter, andthen the routes change and the next packets in the session going to the otherdatacenter.

In my mind, this means that the ANYCAST approach only works great forcompletely stateless things, UDP DNS being a great example. It can workpretty well for short lived connections, TCP DNS queries being a good example.

If you are talking about doing this internally in your company, rather thanexternally to the Internet, your routes are more stable and you can do a lotmore with ANYCAST. There was a presentation a couple of years ago at LISA fromsome folks at Google talking about how they have a ANYCAST IP block that routesto a server in each major office so that normal network resources (mail, DNS,printing, etc) can have a single IP address throughout the company, but beserved by local servers in each office.

ANYCAST has the huge advantage that it deals automatically with datacentersgoing down and similar major outages. When a site stops advertising it'sexistance, it disappears from the Internet within a couple of minutes.



The second approach is to use Dynamic DNS

This is setting up your domain to have a very short time-to-live and addingsmarts in your DNS server so that it looks at where the DNS request is frombefore deciding what response to provide to a query. the short time-to-live isso that as clients move around the network they will keep looking up the nameagain and get the 'best' (closest, least heavily loaded, etc) IP address forthem to actually connect to.

The fundamental problem with this is that your DNS server usually doesn't seequeries directly from the end-user, you see queries from their ISPs DNS server.If it's a smallish ISP, this is just as good, but if it's something like AOL,you may see queries from everyone in the country arriving from a small set of IPaddresses in their central office.

The practical problem with this is that many DNS servers impose a minimumtime-to-live on the DNS data, so they may not query you as frequently as youlike. (does anyone know what the practical minimum is nowadays on the Internet?

This doesn't require special IP ranges, you can distribute load across any IPaddresses that you own.

This requires that your DNS servers be always available (which is a perfectuse of ANYCAST replication), otherwise end-users trying to get to your site willneed to wait through a DNS timeout as they try to access your primary DNS serverbefore they give up and try your backup DNS servers.



The final approach is to do Application Level redirection.

This is where you have a lightweight 'welcome' page that sends users to eitherwww1, www2, or www3 etc depending on whatever load balancing or locationcriteria you want to use. This has the advantage that the load balancing is donebased on where the end-user really is on the Internet relative to you. (you canget really tricky by doing performance measurements in javascript in a browser)

This has fallen out of favor in recent years. It requires a lot more planningup front, and your applications need to be aware of what's going on. Links thatyour end-users see contain location specific info, so if they are passed around,people may hit the 'wrong' site (hopefully not one that's down at the time).However it offers the most control over the load balancing and potentially thebest performance for the edn user.



A quick note on geolocation.

Where the user is in the physical world matters far less than how they areconnected to you. It's very possible for someone to be far closer physically toone datacenter, but far close network wise to a different datacenter. ping timeto the user, BGP hops calculations, and things like that really matter more.ANYCAST addresses this automatically by the way that it works, the end-usersends their packets to the datacenter that is closest to them on the network,not matter where in the world that happens to be.

The best availability/load distribution systems probably take advantage of allof the above messages.


David Lang

P.S. dynamic content generation and commercial CDNs don't tend to work very welltogether, they are optimized for serving static content, which is a veryvaluable niche, but it doesn't satisfy all uses.




On Wed, 21 Nov 2012, Jeremy Charles wrote:

I do agree with you and the others who have offered comments to the effect of 
it not being worth doing.

In order to illustrate why it's not worthwhile for us to do this, I need to 
investigate how I would go about doing it.  If I say it's not worth doing but I 
have no clue what it involves, my opinion won't hold any water.  I need to show 
that I have an understanding of what it would take *before* I can make a 
compelling argument that it's not worth doing.  (Or maybe I'll be surprised and 
find that there is a reasonable and worthwhile way to do it...)

So far, I have come up with these approaches today:

1)  Buy IPv4 and IPv6 blocks to use for our own CDN and then upgrade my 
Internet feeds and equipment to advertise these from all of the places where we 
host our content.

2)  Use geolocation information to build views in BIND that resolve the 
relevant domain names to the different servers that provide the content, based 
on perceived location.

I feel that I've been able to show why #1 doesn't make financial sense and #2 
involves lots of potential inaccuracy and unintended consequences (hint:  You 
may be surprised as to whether clients in Dubai should be sent to my Wisconsin 
headquarters or my Amsterdam colo).

Are there other approaches that I should also be looking at and evaluating?


-----Original Message-----
From: Tracy Reed [mailto:[email protected]]
Sent: Wednesday, November 21, 2012 1:37 PM
To: Jeremy Charles
Cc: [email protected]
Subject: Re: [lopsa-discuss] Build my own CDN

On Wed, Nov 21, 2012 at 07:57:35AM PST, Jeremy Charles spake thusly:

I’ve been asked to look in to what it would take for my employer to build its
own content delivery network hosted on our own hardware at various physical
locations around the world (all two of them, soon to be four).  The intent is
to host our own content, not anybody else’s.


I highly recommend looking into one of the existing federated CDNs which you
can join instead of your own CDN. They provide the software and handle
accounting etc. and you can then use your spare storage/bandwidth to make some
money utilizing your excess capacity.

http://onapp.com/cdn/ is the one I am familiar with.

_______________________________________________
Discuss mailing list
[email protected]
https://lists.lopsa.org/cgi-bin/mailman/listinfo/discuss
This list provided by the League of Professional System Administrators
 http://lopsa.org/

Re: [lopsa-discuss] Build my own CDN

Reply via email to