It is, indeed, necessary to parse web pages in order to download files with particular extensions. Those web pages could be read in memory, however, without being downloaded to disk, themselves. So, I, too, wish that program option were available (unfortunately, I am not able to do the coding, myself).
Jamal -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Micah Cowan Sent: Thursday, December 22, 2011 3:46 PM To: Vikram Narayanan Cc: [email protected] Subject: Re: [Bug-wget] Downloading files with specific extensions (2011?12?22? 08:39), Vikram Narayanan wrote: > On Thu, Dec 22, 2011 at 7:59 PM, Tony Lewis <[email protected]> wrote: >> Vikram Narayanan wrote: >> >>> Isn't it a waste of bandwidth? >>> Is it not possible to check only the PDF files without downloading the >> whole content? >> >> wget has to download HTML content in order to discover where the PDF files >> are located, >> > But curl does it with ease. :) > Why can't wget? I don't really know what you're referring to here. Curl doesn't do recursive fetches, so it can't possible download "just the PDF files" from a site by itself. I'm not sure why you think _any_ program can manage to find all the PDF files on a site, without first downloading enough information to find where those PDF files might be. -mjc
