asm c wrote:
> I've recently been using wget, and got it working for the most part, but
> there's one issue that's really been bugging me. One of the parameters I
> use is '-R "*action=*,*oldid=*"' (side note on the platform: ZSH on
> NetBSD on the SDF public access unix system, although I've also used it
> on windows with the same result). The purpose of this parameter is so
> that, when wget crawls a mid-sized wiki I'd like to have a local copy
> of, it doesn't bother with all the history pages, edit pages, and so
> forth. Not downloading these would save me an enormous amount of time.
> Unfortunately, the parameter is ignored until after the php page is
> downloaded. So, because it waits until it's downloaded to delete it,
> using the param doesn't really help at all.
> Does anyone know how I can stop wget from even downloading matching pages?

Well, you don't mention it, but I'll assume that those patterns occur in
the "query string" portion of the URL: that is, they follow a question
mark (?) that appears at some point.

Unfortunately, the -R and -A options only apply to the "filename"
portion of the URL: that is, whatever falls between the first question
mark, and the first preceding slash (/). Confusingly, it is also then
applied _after_ files are downloaded, to determine whether they should
be deleted after the fact: so Wget probably downloads those files you
really wish it wouldn't, and then deletes them afterwards anyway.

Worse, there's no way around this, currently. This is part of a suite of
problems that are currently slated to be addressed soon. The most
pertinent to your problem, though, is the need for a way to match
against query strings. I'm very much hoping to get around to this before
the next major Wget release, version 1.12. It's being tracked here:


If you add yourself to the Cc list, you'll be able to follow along on
its progress.

Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
