-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi Allan,
You'll generally get better results if you post to the mailing list (wget@sunsite.dk). I've added it to the recipients list. Coombe, Allan David (DPS) wrote: > Hi Micah, > > First some context… > We are using wget 1.11.3 to mirror a web site so we can do some offline > processing on it. The mirror is on a Solaris 10 x86 server. > > The problem we are getting appears to be because the URLs in the HTML > pages that are harvested by wget for downloading have mixed case (the > site we are mirroring is running on a Windows 2000 server using IIS) and > the directory structure created on the mirror have 'duplicate' > directories because of the mixed case. > > For example, the URLs in HTML pages /Senate/committees/index.htm and > /senate/committees/index.htm refer to the same file but wget creates 2 > different directory structures on the mirror site for these URLs. > > This appears to be a fairly basic thing, but we can't see any wget > options that allow us to treat URLs case insensetively. > > We don't really want to post-process the site just to merge the files > and directories with different case. Unfortunately, nothing really comes to mind. If you'd like, you could file a feature request at https://savannah.gnu.org/bugs/?func=additem&group=wget, for an option asking Wget to treat URLs case-insensitively. Finding local files case-insensitively, on a case-sensitive filesystem, would be a PITA; but adding and looking up URLs in the internal blacklist hash wouldn't be too hard. I probably wouldn't get to that for a while, though. Another useful option might be to change the name of "index" files, so that, for instance, you could have URLs like http://foo/ result in "foo/index.htm" or "foo/default.html", rather than "foo/index.html". - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer, and GNU Wget Project Maintainer. http://micah.cowan.name/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIUG937M8hyUobTrERAqq2AJ48mGvcFCSxnouTFqYTuRHzVgwYdgCeLegI vkdzf3Lu+Vn5diCOHk5CRhc= =IlG9 -----END PGP SIGNATURE-----