I am attempting to use wget to create a mirrored copy of a CMS (Liferay) website. I want to be able to failover to this static copy in case the application server goes offline. I therefore need the URL's to remain absolutely identical. The problem I have is that I cannot figure out how I can configure wget in a way that will cope with: http://www.example.com/about http://www.example.com/about/something
In this case either the file or directory 'about' already exists at prevents the second being created. Initially I though the most obvious solution, was to rely on Apache's DirectoryIndex, and save the files as: /about/index.html /about/something/index.html But, currently I can't figure out how I can do this in a way that doesn't break either the relative path to other pages or create links to the index.html rather than the original location. I need the links (a href etc.) to still go to /about and not explicitly call /index.html - as this will mean people may bookmark things that won't exist when the CMS came back. If anyone can offer me any advice on how I can achieve this (either correct options), or how I could patch the source code to achieve this, I would be extremely grateful. Thanks, Paul /usr/local/bin/wget --background --append-output=/tmp/wget-log --no-verbose --tries=20 --waitretry=10 --retry-connrefused --limit-rate=100m --quota=10000m --timestamping --directory-prefix=/usr/local/apache2/content/uk.ac.uea.www_flat2 --protocol-directories --user-agent="UEA WebSite Flattener" --backup-converted -e robots=off --page-requisites --convert-links --recursive --level=inf --trust-server-names --domains example.com www.example.com
