[Bug-wget] Question - saved links with --content-disposition

Harling, Thomas Sun, 06 Jul 2014 12:09:23 -0700

Hi

I'm trying to download part of a site that uses cgi scripts to serve pages but 
also has downloads such as pdfs and I want to get both with wget.


e.g. A page that links to pdfs to download is mysite.com/blah.cgi?key=1234 and 
a pdf to download is at mysite.com/blah.cgi?key=5678 but the actual pdf file is 
called awesome.pdf

If I don't use --content-disposition The main page mysite.com/blah.cgi?key=1234 
that wget downloads has a relative link /blah.cgi?key=5678 to the pdf it also 
downloads, so when I'm browsing through the files on my computer I can click 
the link and the pdf opens in my browser.

However, the pdf is named blah.cgi?key=5678 which really isn't that 
descriptive, especially as the site has a few hundred pdfs which are otherwise 
very usefully named. Using --content-disposition works in that the pdf is now 
saved as awesome.pdf, but the hyperlink in the original blah.cgi?key=1234 page 
downloaded still points to /blah.cgi?key=5678 and not /awesome.pdf, so I can't 
browse my downloaded copy of the site on my computer as before.

Is there a way to fix this, so I get both the actual filename and the correct 
link in the downloaded html page? Given the links are re-written by wget in 
saved html pages anyway, it isn't hard to imagine an if-statement which picks 
the correct filename to point to when writing the new links.

Thanks
Tom

[Bug-wget] Question - saved links with --content-disposition

Reply via email to