Re: [Bug-wget] wget mirror site failing due to file / directory name clashes

Micah Cowan Fri, 12 Oct 2012 13:55:54 -0700

On 10/12/2012 06:38 AM, Paul Beckett (ITCS) wrote:
> I am attempting to use wget to create a mirrored copy of a CMS (Liferay) 
> website. I want to be able to failover to this static copy in case the 
> application server goes offline. I therefore need the URL's to remain 
> absolutely identical. The problem I have is that I cannot figure out how I 
> can configure wget in a way that will cope with:
> http://www.example.com/about
> http://www.example.com/about/something
> 
> In this case either the file or directory 'about' already exists at prevents 
> the second being created.


Further discussion/info about this problem:

http://savannah.gnu.org/bugs/?func=detailitem&item_id=23756
http://savannah.gnu.org/bugs/?func=detailitem&item_id=29647

> 
> Initially I though the most obvious solution, was to rely on Apache's 
> DirectoryIndex, and save the files as:
> /about/index.html
> /about/something/index.html
> 
> But, currently I can't figure out how I can do this in a way that doesn't 
> break either the relative path to other pages or create links to the 
> index.html rather than the original location. I need the links (a href etc.) 
> to still go to /about and not explicitly call /index.html - as this will mean 
> people may bookmark things that won't exist when the CMS came back.

Why not use links like /about/, rather than /about? Then it should
hopefully work for both cases.

-mjc

Re: [Bug-wget] wget mirror site failing due to file / directory name clashes

Reply via email to