RE: simple wget question

Willener, Pat Sun, 13 May 2007 19:55:07 -0700

Sorry, I didn't see that Steven has already answered the question. 

-----Original Message-----
From: Steven M. Schweda [mailto:[EMAIL PROTECTED] 
Sent: Saturday, May 12, 2007 10:05
To: WGET@sunsite.dk
Cc: [EMAIL PROTECTED]
Subject: Re: simple wget question


From: R Kimber

> What I'm trying to download is what I might express as:
> 
> http://www.stirling.gov.uk/*.pdf

   At last.

> but I guess that's not possible.

   In general, it's not.  FTP servers often support wildcards.  HTTP
servers do not.  Generally, an HTTP server will not give you a list of
all its files the way an FTP server often will, which is why I asked (so
long ago) "If there's a Web page which has links to all of them, [...]".

>   I just wondered if it was possible
> for wget to filter out everything except *.pdf - i.e. wget would look
> at a site, or a directory on a site, and just accept those files that
> match a pattern.

   Wget has options for this, as suggested before ("wget -h"):

[...]
Recursive accept/reject:
  -A,  --accept=LIST               comma-separated list of accepted extensions.
  -R,  --reject=LIST               comma-separated list of rejected extensions.
[...]

but, like many of us, it's not psychic.  It needs explict URLs or else
instructions ("-r") to follow links which it sees in the pages it sucks
down.  If you don't have a list of the URLs you want, and you don't have
URLs for one or more Web pages which contain links to the items you
want, then you're probably out of luck.

------------------------------------------------------------------------

   Steven M. Schweda               [EMAIL PROTECTED]
   382 South Warwick Street        (+1) 651-699-9818
   Saint Paul  MN  55105-2547

RE: simple wget question

Reply via email to