Micah Cowan wrote:
Announcing the release of version 1.11.1 of GNU Wget.
** Documentation of accept/reject lists in the manual's "Types of
Files" section now explains various aspects of their behavior that may
be surprising, and notes that they may change in the future.
I'm glad to see that this made it into the docs - even if this behavior is drastically altered in the next rev. I'm interested in your thoughts on the future of the accept/reject filter options. Currently, accept/reject provides mixed control over file retention and link traversal. Those filters do not apply to html files during the first pass through those filters (for traversal), but do apply during during the second pass for file retention. I can see splitting the accept/reject filters into two independent filter sets. One set would follow/no-follow links and the other set would keep/delete files after retrieval. Obviously query string matching would be nice in the first set. OTOH, I can imagine keeping accept/reject solely to control file retention and using more advanced logic than simple htm/html extension matching to get deeper traversal of script sites when permitted by the recursion depth or other controls. What do you see as the best approach?

As long as I'm posting, I'll give some very minor feedback on the docs. It would be nice to have a cross reference of the three formats - short option, long option and control file or just list all 3 in the first discussion of the option. Section 4 uses that method, but Section 2 does not. I often found myself searching for the correct wgetrc startup file format after reading up on an the option. As an example, Section 2 tells you that `-l depth' or `--level=depth' can be used as recursion depth options, but you have to do a bit of searching to find out that "reclevel=depth" and not "level=depth" is the matching wgetrc command.

Related to the same issue and for other Windows users who may search the archive: as a new user, it's nice to use the long form option, since it makes it easier to remember what you're trying to do. However, a command line of 200 chars is hard to read. I found myself organizing all my options into a customized wgetrc file for each site. In Windows, each instance of wget started via a batch file would spawn it's own local environment, so I could run multiple copies of wget simultaneously, each initiated from a separate batch file and each with its own customized "set WGETRC=Site1-wgetrc.txt" followed by the basic "wget Site1.com" command.





** Documentation of --no-parents now explains how a trailing slash, or
lack thereof, in the specified URL, will affect behavior.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer,
and GNU Wget Project Maintainer.
http://micah.cowan.name/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFH6DIn7M8hyUobTrERAvYMAJ9Ue10o87jff1xuZo5hHFzUwkI3oQCfWVTt
HikOEmEAIxjtzV1Pliji5g8=
=jO0N
-----END PGP SIGNATURE-----

Reply via email to