Hi I'm trying to download part of a site that uses cgi scripts to serve pages but also has downloads such as pdfs and I want to get both with wget.
e.g. A page that links to pdfs to download is mysite.com/blah.cgi?key=1234 and a pdf to download is at mysite.com/blah.cgi?key=5678 but the actual pdf file is called awesome.pdf If I don't use --content-disposition The main page mysite.com/blah.cgi?key=1234 that wget downloads has a relative link /blah.cgi?key=5678 to the pdf it also downloads, so when I'm browsing through the files on my computer I can click the link and the pdf opens in my browser. However, the pdf is named blah.cgi?key=5678 which really isn't that descriptive, especially as the site has a few hundred pdfs which are otherwise very usefully named. Using --content-disposition works in that the pdf is now saved as awesome.pdf, but the hyperlink in the original blah.cgi?key=1234 page downloaded still points to /blah.cgi?key=5678 and not /awesome.pdf, so I can't browse my downloaded copy of the site on my computer as before. Is there a way to fix this, so I get both the actual filename and the correct link in the downloaded html page? Given the links are re-written by wget in saved html pages anyway, it isn't hard to imagine an if-statement which picks the correct filename to point to when writing the new links. Thanks Tom
