Re: Release: GNU Wget 1.11.1

Todd Pattist Tue, 25 Mar 2008 11:38:22 -0700

Micah Cowan wrote:

Announcing the release of version 1.11.1 of GNU Wget.
** Documentation of accept/reject lists in the manual's "Types of
Files" section now explains various aspects of their behavior that may
be surprising, and notes that they may change in the future.

I'm glad to see that this made it into the docs - even if this behavioris drastically altered in the next rev.I'm interested in your thoughts on the future of the accept/rejectfilter options. Currently, accept/reject provides mixed control overfile retention and link traversal. Those filters do not apply to htmlfiles during the first pass through those filters (for traversal), butdo apply during during the second pass for file retention.I can see splitting the accept/reject filters into two independentfilter sets. One set would follow/no-follow links and the other setwould keep/delete files after retrieval. Obviously query stringmatching would be nice in the first set. OTOH, I can imagine keepingaccept/reject solely to control file retention and using more advancedlogic than simple htm/html extension matching to get deeper traversal ofscript sites when permitted by the recursion depth or other controls.What do you see as the best approach?

As long as I'm posting, I'll give some very minor feedback on the docs.It would be nice to have a cross reference of the three formats - shortoption, long option and control file or just list all 3 in the firstdiscussion of the option. Section 4 uses that method, but Section 2does not. I often found myself searching for the correct wgetrc startupfile format after reading up on an the option. As an example, Section 2tells you that `-l depth' or `--level=depth' can be used as recursiondepth options, but you have to do a bit of searching to find out that"reclevel=depth" and not "level=depth" is the matching wgetrc command.

Related to the same issue and for other Windows users who may search thearchive: as a new user, it's nice to use the long form option, since itmakes it easier to remember what you're trying to do. However, acommand line of 200 chars is hard to read. I found myself organizingall my options into a customized wgetrc file for each site. In Windows,each instance of wget started via a batch file would spawn it's ownlocal environment, so I could run multiple copies of wgetsimultaneously, each initiated from a separate batch file and each withits own customized "set WGETRC=Site1-wgetrc.txt" followed by the basic"wget Site1.com" command.

** Documentation of --no-parents now explains how a trailing slash, or
lack thereof, in the specified URL, will affect behavior.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer,
and GNU Wget Project Maintainer.
http://micah.cowan.name/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFH6DIn7M8hyUobTrERAvYMAJ9Ue10o87jff1xuZo5hHFzUwkI3oQCfWVTt
HikOEmEAIxjtzV1Pliji5g8=
=jO0N
-----END PGP SIGNATURE-----

Re: Release: GNU Wget 1.11.1

Reply via email to