RE: wget url with hash # issue

Tony Lewis Thu, 06 Sep 2007 06:16:35 -0700

Micah Cowan wrote:

> If you mean that you want Wget to find any file that matches that
> wildcard, well no: Wget can do that for FTP, which supports directory
> listings; it can't do that for HTTP, which has no means for listing
> files in a "directory" (unless it has been extended, for example with
> WebDAV, to do so).


Seems to me that is a big "unless" because we've all seen lots of websites
that have http directory listings. Apache will do it out of the box (and by
default) if there is no index.htm[l] file in the directory.

Perhaps we could have a feature to grab all or some of the files in a HTTP
directory listing. Maybe something like this could be made to work:

wget http://www.exelana.com/images/mc*.gif

Perhaps we would need an option such as --http-directory (the first thing
that came to mind, but not necessarily the most intuitive name for the
option) to explicitly tell wget how it is expected to behave. Or perhaps it
can just try stripping the filename when doing an http request and wildcards
are specified.

At any rate (with or without the command line option), wget would retrieve
http://www.exelana.com/images/ and then retrieve any links where the target
matches mc*.gif.

If wget is going to explicitly support http directory listings, it probably
needs to be intelligent enough to ignore the sorting options. In the case of
Apache, that would be things like <A HREF="?N=D">Name</A>.

Anyone have any idea how many different http directory listing formats are
out there?

Tony

RE: wget url with hash # issue

Reply via email to