Re: [Bug-wget] wget mirror site failing due to file / directory name clashes

Paul Beckett (ITCS) Tue, 16 Oct 2012 02:51:15 -0700

Tim,
You raise a good point about CMS functionality exceeding that of flat pages, 
and I understand that some CMS features wouldn't function. However for our 
public facing site the majority of the site is essentially static pages that 
would be reproduced perfectly. Another portion is dynamically generated based 
upon a get request where all the information is in the URL, so I think these 
could probably also be reproduced. For our site there would only be a 
relatively small amount of content that required more dynamic interaction with 
the server and couldn't be flattened


We do have fairly resilient load balanced systems, however these are not 
infallible, they are also difficult (and expensive in licensing) to replicate 
reliably outside of our data centre, to cater for a total loss of connectivity 
to our data centre.

What I would like to do is use Apache's proxy-balancer to do some of the 
current load-balancing but be able to failover to flat pages (reasoning this is 
far better than nothing), in the event that all of the load-balanced nodes 
fail. And to be able to mirror the flattened pages offsite in case our data 
centre lost network connectivity.

Thanks,
Paul



>-----Original Message-----
>From: Tim Ruehsen [mailto:[email protected]]
>Sent: Tuesday, October 16, 2012 9:14 AM
>To: [email protected]
>Cc: Paul Beckett (ITCS)
>Subject: Re: [Bug-wget] wget mirror site failing due to file / directory name
>clashes
>
>Am Friday 12 October 2012 schrieb Paul Beckett (ITCS):
>> I am attempting to use wget to create a mirrored copy of a CMS
>> (Liferay) website. I want to be able to failover to this static copy
>> in case the application server goes offline. I therefore need the
>> URL's to remain absolutely identical. The problem I have is that I
>> cannot figure out how I can configure wget in a way that will cope with:
>> http://www.example.com/about
>> http://www.example.com/about/something
>
>You can't make a failover copy with wget like tools. Maybe except for very
>simple web sites, but a CMS isn't that simple.
>On a web server there will be many essential resources that are not available
>via remote access (e.g. scripts, servlets, server configuration, database, 
>...).
>What I want to say is: Even if you solve this (minor) problem of not being able
>to map URL paths to the local filesystem (a problem that occurs from time to
>time which can generally be solved by transforming the URL into a key/value
>pair. AFAIK, wget doesn't have such a feature yet), you will stumble over the
>next problem that prevents your copy to be a failover copy.
>
>It sounds that you have administrative access to your company's web server.
>So why not using any of the thousands of "professional"
>backup/failover/redundancy mechanisms for such use cases ?
>E.g. a filesystem and database cluster - today there should be out-of-the-box
>solutions.
>
>But maybe I don't get your intention...
>
>Tim

Re: [Bug-wget] wget mirror site failing due to file / directory name clashes

Reply via email to