I have not done this, and it's been a few years since I researched this, so
please correct me where I'm missing something.
There are two approaches to doing the 'distributed content delivery' job
One is to use ANYCAST
This is leveraging a 'bug' in IPv4 and IPv6 where nothing guarantees that an
IP address only exists one place on the Internet. Since the Internet works by
the people who own the IP address ranges advertising "I know how to get to this
range" and dynamically creating routes based on what they here in terms of
advertisements from other similar routers, it turns out that you can advertise
the same IP address from multiple datacenters. As long as all the datacenters
serve the same content (and the routes don't change, see more below) this is
completely transparent to users.
The problem with this approach is that routes around the Internet do change,
and you can end up with a end-user starting a session with one datacenter, and
then the routes change and the next packets in the session going to the other
datacenter.
In my mind, this means that the ANYCAST approach only works great for
completely stateless things, UDP DNS being a great example. It can work
pretty well for short lived connections, TCP DNS queries being a good example.
If you are talking about doing this internally in your company, rather than
externally to the Internet, your routes are more stable and you can do a lot
more with ANYCAST. There was a presentation a couple of years ago at LISA from
some folks at Google talking about how they have a ANYCAST IP block that routes
to a server in each major office so that normal network resources (mail, DNS,
printing, etc) can have a single IP address throughout the company, but be
served by local servers in each office.
ANYCAST has the huge advantage that it deals automatically with datacenters
going down and similar major outages. When a site stops advertising it's
existance, it disappears from the Internet within a couple of minutes.
The second approach is to use Dynamic DNS
This is setting up your domain to have a very short time-to-live and adding
smarts in your DNS server so that it looks at where the DNS request is from
before deciding what response to provide to a query. the short time-to-live is
so that as clients move around the network they will keep looking up the name
again and get the 'best' (closest, least heavily loaded, etc) IP address for
them to actually connect to.
The fundamental problem with this is that your DNS server usually doesn't see
queries directly from the end-user, you see queries from their ISPs DNS server.
If it's a smallish ISP, this is just as good, but if it's something like AOL,
you may see queries from everyone in the country arriving from a small set of IP
addresses in their central office.
The practical problem with this is that many DNS servers impose a minimum
time-to-live on the DNS data, so they may not query you as frequently as you
like. (does anyone know what the practical minimum is nowadays on the Internet?
This doesn't require special IP ranges, you can distribute load across any IP
addresses that you own.
This requires that your DNS servers be always available (which is a perfect
use of ANYCAST replication), otherwise end-users trying to get to your site will
need to wait through a DNS timeout as they try to access your primary DNS server
before they give up and try your backup DNS servers.
The final approach is to do Application Level redirection.
This is where you have a lightweight 'welcome' page that sends users to either
www1, www2, or www3 etc depending on whatever load balancing or location
criteria you want to use. This has the advantage that the load balancing is done
based on where the end-user really is on the Internet relative to you. (you can
get really tricky by doing performance measurements in javascript in a browser)
This has fallen out of favor in recent years. It requires a lot more planning
up front, and your applications need to be aware of what's going on. Links that
your end-users see contain location specific info, so if they are passed around,
people may hit the 'wrong' site (hopefully not one that's down at the time).
However it offers the most control over the load balancing and potentially the
best performance for the edn user.
A quick note on geolocation.
Where the user is in the physical world matters far less than how they are
connected to you. It's very possible for someone to be far closer physically to
one datacenter, but far close network wise to a different datacenter. ping time
to the user, BGP hops calculations, and things like that really matter more.
ANYCAST addresses this automatically by the way that it works, the end-user
sends their packets to the datacenter that is closest to them on the network,
not matter where in the world that happens to be.
The best availability/load distribution systems probably take advantage of all
of the above messages.
David Lang
P.S. dynamic content generation and commercial CDNs don't tend to work very well
together, they are optimized for serving static content, which is a very
valuable niche, but it doesn't satisfy all uses.
On Wed, 21 Nov 2012, Jeremy Charles wrote:
I do agree with you and the others who have offered comments to the effect of
it not being worth doing.
In order to illustrate why it's not worthwhile for us to do this, I need to
investigate how I would go about doing it. If I say it's not worth doing but I
have no clue what it involves, my opinion won't hold any water. I need to show
that I have an understanding of what it would take *before* I can make a
compelling argument that it's not worth doing. (Or maybe I'll be surprised and
find that there is a reasonable and worthwhile way to do it...)
So far, I have come up with these approaches today:
1) Buy IPv4 and IPv6 blocks to use for our own CDN and then upgrade my
Internet feeds and equipment to advertise these from all of the places where we
host our content.
2) Use geolocation information to build views in BIND that resolve the
relevant domain names to the different servers that provide the content, based
on perceived location.
I feel that I've been able to show why #1 doesn't make financial sense and #2
involves lots of potential inaccuracy and unintended consequences (hint: You
may be surprised as to whether clients in Dubai should be sent to my Wisconsin
headquarters or my Amsterdam colo).
Are there other approaches that I should also be looking at and evaluating?
-----Original Message-----
From: Tracy Reed [mailto:[email protected]]
Sent: Wednesday, November 21, 2012 1:37 PM
To: Jeremy Charles
Cc: [email protected]
Subject: Re: [lopsa-discuss] Build my own CDN
On Wed, Nov 21, 2012 at 07:57:35AM PST, Jeremy Charles spake thusly:
I’ve been asked to look in to what it would take for my employer to build its
own content delivery network hosted on our own hardware at various physical
locations around the world (all two of them, soon to be four). The intent is
to host our own content, not anybody else’s.
I highly recommend looking into one of the existing federated CDNs which you
can join instead of your own CDN. They provide the software and handle
accounting etc. and you can then use your spare storage/bandwidth to make some
money utilizing your excess capacity.
http://onapp.com/cdn/ is the one I am familiar with.
_______________________________________________
Discuss mailing list
[email protected]
https://lists.lopsa.org/cgi-bin/mailman/listinfo/discuss
This list provided by the League of Professional System Administrators
http://lopsa.org/