-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Todd Pattist wrote:
> Micah Cowan wrote:
>> After another look at the relevant portions of the source code, it looks
>> like accept/reject rules are _always_ applied against the local
>> filename, contrary to what I'd been thinking. This needs to be changed.
>> (But it probably won't be, any time soon.
>>   
> .... and
>> If something _does_ match the accept rules, and turns out after download
>> to be an HTML file (determined by the server's headers), it will
>> traverse it further; but of course it won't delete them afterward
>> because they matched the accept list.
> 
> I'd like to help clarify for others who may read this how wget 1.11 is
> working for .php, .cgi and similar files (on Windows, but I expect the
> behavior is the same on other OSs).  It has taken me a while to grok
> this even partially.  The first quote above is correct.  The second
> quote is not, at least not when you use html_extension = on as I do. 
> The reason it's not correct is because the first quote is correct.

Well, -E is special, true. But in general the second quote is (by
definition) correct.

- -E, obviously, _shouldn't_ be special...

> I haven't yet quite figured out file extension matching versus string
> matching in filenames, but extensions seem to match regardless of
> leading characters or following ?id=1 parameters.

That's right; the "query" portion of the URL is not used to determine
matching. There are, of course, times when you specifically wish to tell
wget not to follow certain specific query strings (such as edit or print
or... in wikis); wget doesn't currently support this (I plan to fix this).

I'm fairly unhappy with the whole accept/reject mechanism, actually:
matching on filename extensions is something of a kludge anyway (though,
obviously, much more efficient than going by content-type); and ignoring
what the user has requested wrt html files is bad, IMO. I should
probably deprecate that behavior in 1.11.1.

Matching against the local filename instead of the URL is a separate
issue. IIRC, there may actually already be a patch on this, or else I
applied one that addressed one situation and not another.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFH4T0G7M8hyUobTrERAjzdAJ9E6xJkLLWBYVPjqiSfdVhQF7wJ2QCeNZ1Y
WMaeTJ1UNftwnAnpB2p9UDs=
=SUj/
-----END PGP SIGNATURE-----

Reply via email to