Re: Only follow paths with /res/ in them
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Brian wrote: I would like to follow all the urls on a site that contain /res/ in the path. I've tried using -I and -A, with values such as res, *res*, */res/*, etc.. Here is an example that downloads pretty much the entire site, rather than what I appear (to me) to have specified: wget -O- -q http://img.site.org/b/imgboard.html | wget -q -r -l1 -O- -I '*res*' -A '*res*' --force-html -B http://img.site.org/b/ -i- The urls I would like to follow and output to the command line are of the form: http://img.site.org/b/res/97867797.html - -A isn't useful here: it's applied only against the filename portion of the URL. - -I is what you want; the trouble is that the * wildcard doesn't match slashes (there's plans to introduce a ** wildcard, probably in 1.13). So unfortunately you gotta do -I'res,*/res,*/*/res' etc as needed. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkkk7awACgkQ7M8hyUobTrG2wgCeMUN3EnnY2VsmNzQTWOleZKqg ZQYAn1CYoQ7JVc4OYfwLzcPVkai93UQc =3I6Z -END PGP SIGNATURE-
Re: Only follow paths with /res/ in them
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Oh! Please don't use this list (wget@sunsite.dk) any more; I'm trying to get the dotsrc folks to make it go away/forward to bug-wget (I need to ping 'em on this again). The official list for Wget is now [EMAIL PROTECTED] Micah Cowan wrote: Brian wrote: I would like to follow all the urls on a site that contain /res/ in the path. I've tried using -I and -A, with values such as res, *res*, */res/*, etc.. Here is an example that downloads pretty much the entire site, rather than what I appear (to me) to have specified: wget -O- -q http://img.site.org/b/imgboard.html | wget -q -r -l1 -O- -I '*res*' -A '*res*' --force-html -B http://img.site.org/b/ -i- The urls I would like to follow and output to the command line are of the form: http://img.site.org/b/res/97867797.html -A isn't useful here: it's applied only against the filename portion of the URL. -I is what you want; the trouble is that the * wildcard doesn't match slashes (there's plans to introduce a ** wildcard, probably in 1.13). So unfortunately you gotta do -I'res,*/res,*/*/res' etc as needed. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkkk7j0ACgkQ7M8hyUobTrH+CACbBzcO4vM6qHIumBeDS2ZyAdfq ONYAnjX7SHAOvEJylkbjjq7IsDXEv+27 =3Hrq -END PGP SIGNATURE-
Re: MAILING LIST IS MOVING: [EMAIL PROTECTED]
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Maciej W. Rozycki wrote: On Fri, 31 Oct 2008, Micah Cowan wrote: I will ask the dotsrc.org folks to set up this mailing list as a forwarding alias to [EMAIL PROTECTED] (the reverse of recent history). At that time, no further mails will be sent to subscribers of this list. Please subscribe to [EMAIL PROTECTED] instead. At this time, I'm thinking of merging wget@sunsite.dk and [EMAIL PROTECTED]; there isn't really enough traffic to justify separate lists, IMO; and often discussions come up on submitted patches that are of interest to everyone. I am puzzled. You mean you declare wget@sunsite.dk retired and [EMAIL PROTECTED] is to be used from now on for the purpose the former list instead? And [EMAIL PROTECTED] will most likely be retired as well soon with the replacement to be [EMAIL PROTECTED] as well? Yup, that's what I mean. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFJDIA77M8hyUobTrERAkr4AJwK7uoprV2Am1j9dAzNkLgQLZz8FwCdEM2q 2AMuQCNzrZzsVaz1UxvBCuk= =WiLZ -END PGP SIGNATURE-
MAILING LIST IS MOVING: [EMAIL PROTECTED]
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 [EMAIL PROTECTED] is now back in business as a full-fledged mailing list, and not just a forwarding alias to here. Please subscribe using the interface at http://lists.gnu.org/mailman/listinfo/bug-wget/ at your earliest convenience. I had hoped to leave forwarding still enabled during the transition; I subscribed wget@sunsite.dk but that did not seem to do the trick. So mails at [EMAIL PROTECTED] will not show up here at the present time. I will ask the dotsrc.org folks to set up this mailing list as a forwarding alias to [EMAIL PROTECTED] (the reverse of recent history). At that time, no further mails will be sent to subscribers of this list. Please subscribe to [EMAIL PROTECTED] instead. At this time, I'm thinking of merging wget@sunsite.dk and [EMAIL PROTECTED]; there isn't really enough traffic to justify separate lists, IMO; and often discussions come up on submitted patches that are of interest to everyone. Please avoid continued use of this list if possible. The gmane and mail-archive.com sites will be asked to use the new list for archiving purposes (and of course, bug-wget will also be archived via GNU's pipermail setup). Some of the reasons for this migration may be found at http://article.gmane.org/gmane.comp.web.wget.general/8200/ In addition, people have recently been having difficulties with spam blocking preventing their unsubscription(!), subscription, or even contacting dotsrc.org staff about resolving subscription problems. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFJC9/37M8hyUobTrERAuaMAJ9ByOhOnpQr81q6BJO/ytA4wUQkdgCfcPq0 3q88DFI/PL3LtcIx6ky9Vd8= =czx7 -END PGP SIGNATURE-
Re: MAILING LIST IS MOVING: [EMAIL PROTECTED]
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Micah Cowan wrote: [EMAIL PROTECTED] is now back in business as a full-fledged mailing list, and not just a forwarding alias to here. Please subscribe using the interface at http://lists.gnu.org/mailman/listinfo/bug-wget/ at your earliest convenience. Email interface: send an email to [EMAIL PROTECTED] - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFJC+vL7M8hyUobTrERAmEsAJ49xkwHMv75li+ihHV38NIP44ho4QCfaAue hUPMKQbmpdrYqPO8M8CSrzE= =CwYx -END PGP SIGNATURE-
Re: -m alias
Michelle Konzack wrote: ??? -- How can you post without being subscribed? My posts went all definitively rejected when I tried to post to this list. Strange. People are definitely posting to the list without having to be subscribed. However, folks have been known to be rejected as spam, even for unsubscription requests. :\ I've been considering a move to gnu servers; but I'm not sure their spam filters are better (though at least they wouldn't reject unsubscriptions I think). But mostly, I'm not motivated enough to get off my lazy butt yet. If we start having more serious problems, perhaps the motivation will increase sufficiently... -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/
Re: wget re-download fully downloaded files
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Maksim Ivanov wrote: I'm trying to download the same file from the same server, command line I use: wget --debug -o log -c -t 0 --load-cookies=cookie_file http://rapidshare.com/files/153131390/Blind-Test.rar Below attached 2 files: log with 1.9.1 and log with 1.10.2 Both logs are made when Blind-Test.rar was already on my HDD. Sorry for some mess in logs, but russian language used on my console. Thanks very much for providing these, Maksim; they were very helpful. (Sorry for getting back to you so late: it's been busy lately). I've confirmed this behavioral difference (though I compared the current development sources against 1.8.2, rather than 1.10.2 to 1.9.1). Your logs involve a 302 redirection before arriving at the real file, but that's just a red herring. The difference is that when 1.9.1 encountered a server that would respond to a byte-range request with 200 (meaning it doesn't know how to send partial contents), but with a Content-Length value matching the size of the local file, then wget would close the connection and not proceed to redownload. 1.10.2, on the other hand, would just re-download it. Actually, I'll have to confirm this, but I think that current Wget will re-download it, but not overwrite the current content, until it arrives at some content corresponding to bytes beyond the current content. I need to investigate further to see if this change was somehow intentional (though I can't imagine what the reasoning would be); if I don't find a good reason not to, I'll revert this behavior. Probably for the 1.12 release, but I might possibly punt it to 1.13 on the grounds that it's not a recent regression (however, it should really be a quick fix, so most likely it'll be in for 1.12). - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFJBfOj7M8hyUobTrERAjNTAJ9ayaKLvN4bYS/7o0kYcQywDvfwNgCfcGzz P9aAwVD6Q/xQuACjU7KF1ng= =m5QO -END PGP SIGNATURE-
Re: --mirror and --cut-dirs=2 bug?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Brock Murch wrote: I try to keep a mirror of NASA atteph ancilliary data for modis processing. I know that means little, but I have a cron script that runs 2 times a day. Sometimes it works, and others, not so much. The sh script is listed at the end of this email below. As is the contents of the remote ftp server's root and portions fo the log. I don't need all the data on the remote server, only some thus I use --cut-dirs.To make matters stranger, the software (also from NASA) that uses these files, looks for them in a single place on the client machine where the software runs, but needs data from 2 different directories on the remote ftp server. If the data is not on the client machine, the software kindly ftp's the files to the local directory. However, I don't allow write access to that directory as many people use the software and when it is d/l'ed it has the wrong perms for others to use it, thus I mirror the data I need from the ftp site locally. In the script below, there are 2 wget commands, but they are to slightly different directories (MODISA MODIST). I wouldn't recommend that. Using the same output directory for two different source directories seems likely to lead to problems. You'd most likely be better off by pulling to two locations, and then combining them afterwards. I don't know for sure that it _will_ cause problems (except if they happen to have same-named files), as long as .listing files are being properly removed (there were some recently-fixed bugs related to that, I think? ...just appending new listings on top of existing files). It appears to me that the problem occurs if there is a ftp server error, and wget starts a retry. wget goes to the server root, gets the .listing from there for some reason (as opposed to the directory it should go to on the server), and then goes to the dir it needs to mirror and can't find the files (that are listed in the root dir) and creates dirs, and then I get No such file errors and recursive directories created. Any advice would be appreciated. This snippet seems to be the source of the problem: Error in server response, closing control connection. Retrying. - --14:53:53-- ftp://oceans.gsfc.nasa.gov/MODIST/ATTEPH/2002/110/ (try: 2) = `/home1/software/modis/atteph/2002/110/.listing' Connecting to oceans.gsfc.nasa.gov|169.154.128.45|:21... connected. Logging in as anonymous ... Logged in! == SYST ... done.== PWD ... done. == TYPE I ... done. == CWD not required. == PASV ... done.== LIST ... done. That CWD not required bit is erroneous. I'm 90% sure we fixed this issue recently (though I'm not 100% sure that it went to release: I believe so). I believe we made some related fixes more recently. You provided a great amount of useful information, but one thing that seems to be missing (or I missed it) is the Wget version number. Judging from the log, I'd say it's 1.10.2 or older; the most recent version of Wget is 1.11.4; could you please try to verify whether Wget continues to exhibit this problem in the latest release version? I'll also try to look into this as I have time (but it might be awhile before I can give it some serious attention; it'd be very helpful if you could do a little more legwork). - -- Thanks very much, Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFJBgNh7M8hyUobTrERAuGoAKCCUoBN0sURKA/51x0o4HN59K8+AACfUYuj i8XW58MvjvbS3oy4OsOmbpc= =4kpD -END PGP SIGNATURE-
Re: --mirror and --cut-dirs=2 bug?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Micah Cowan wrote: I believe we made some related fixes more recently. You provided a great amount of useful information, but one thing that seems to be missing (or I missed it) is the Wget version number. Judging from the log, I'd say it's 1.10.2 or older; the most recent version of Wget is 1.11.4; could you please try to verify whether Wget continues to exhibit this problem in the latest release version? This problem looks like the one that Mike Grant fixed in October of 2006: http://hg.addictivecode.org/wget/1.11/rev/161aa64e7e8f, so it should definitely be fixed in 1.11.4. Please let me know if it isn't. - -- Regards, Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFJBgY+7M8hyUobTrERArrRAJ4p4Y7jwWfic0Wul7UBnBXlSzD2XQCePifc kWs00JOULkzJmzozK7lmcfA= =iSL3 -END PGP SIGNATURE-
More on query matching [Re: Need Design Documents]
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 kalpana ravi wrote: Hi Everybody, Hi kalpana, You sent this message to me and [EMAIL PROTECTED]; you wanted [EMAIL PROTECTED] My name is kalpana Ravi.I am planning to contribute to add one of the features listed in https://savannah.gnu.org/bugs/?22089. For that i need to know the design diagrams to understand better. Does anybody know where the UML diagrams are there? We don't have UML diagrams for wget: you'll just have to read the sources (which, unfortunately, are messy). I have some rough-draft diagrams of how I _want_ wget to look eventually, but I'm not done with those, and anyway they wouldn't help you with wget now. Even if you had the UML diagrams for the current state, you'd still need to understand the sources; I really don't think they'd help you much. More important than understanding the design, is understanding what needs to be done; we're still getting a grip on that. My current thought is that there should be a --query-reject (and probably --query-accept, though the former seems far more useful) that should be matched against key/value pairs; thus, --query-reject 'foo=baraction=edit' would reject anything that has foo=bar and action=edit as the key/value pairs in the query string, even if they're not actually next to each other; an example rejected URL might be http://example.com/index.php?a=baction=edittoken=blahfoo=barhergle. Not all query strings are in the key=value format, so --query-reject 'abc1254' would be allowed, and match against the entire query string. For an idea how URL filename matching is currently done, you might check out acceptable src/util.c and the functions it calls, to get an idea of how query matching might be implemented. However, I'll probably tackle this bug myself pretty soon if no one else has managed it yet, as I'm very interested in getting Wget 1.12 finished before long into the new year (ideally, _before_ the new year, but that probably ain't gonna happen). - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFJBgt77M8hyUobTrERAnqrAJ921WjEax0kMFf5Ls70Lvvq6LBItgCeL6wj UWA/2b+kVMw8L8IsVjIAGhI= =WKJk -END PGP SIGNATURE-
Re: wget re-download fully downloaded files
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Maksim Ivanov wrote: I'm trying to download the same file from the same server, command line I use: wget --debug -o log -c -t 0 --load-cookies=cookie_file http://rapidshare.com/files/153131390/Blind-Test.rar Below attached 2 files: log with 1.9.1 and log with 1.10.2 Both logs are made when Blind-Test.rar was already on my HDD. Sorry for some mess in logs, but russian language used on my console. This is currently being tracked at https://savannah.gnu.org/bugs/?24662 A similar and related bug report is at https://savannah.gnu.org/bugs/?24642 in which the logs show that rapidshare.com issues also issues erroneous Content-Range information when it responds with a 206 Partial Content, which exercised a different regression* introduced in 1.11.x. * It's not really a regression, since it's desirable behavior: we now determine the size of the content from the content-range header, since content-length is often missing or erroneous for partial content. However, in this instance of server error, it resulted in less-desirable behavior than the previous version of Wget. Anyway... - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFJBhvA7M8hyUobTrERAty1AKCEscXut6FDXvXlxpuSBtKkii1/awCeJH0M +JcJ5xG67K7CxHBEcV1x/zY= =D2uE -END PGP SIGNATURE-
Re: re-mirror + no-clobber
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Jonathan Elsas wrote: ... I've issued the command wget -nc -r -l inf -H -D www.example.com,www2.example.com http://www.example.com but, I get the message: file 'www.example.com/index.html' already there; not retrieving. and the process exits. According to the man page files with .html suffix will be loaded off disk and parsed but this does not appear to be happening. Am I missing something? Yes. It has to download the files before they can be loaded from the disk and parsed. When it encounters a file at a given location, it doesn't have any way to know that that file corresponds to the one it's trying to download. Timestamping with -N may be more what you want, rather than -nc? I'm open to suggestions on clarifying the documentation. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFJA7Ds7M8hyUobTrERAsONAJ0dqYh0av7rQ80F8JIcvxhZ1ee7fwCdFG+y AJJxMPVzHpmqAy7iGVRWmCU= =wwns -END PGP SIGNATURE-
Re: accept/reject rules based on querysting
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Gustavo Ayala wrote: Any ideas about when this option (or an acceptable workaround) will be implemented ? I need to include/exclude based on querysting (with regular expression of course). File name is not enough. I consider it an important feature, and currently expect to implement it for 1.12. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFI/faT7M8hyUobTrERApXLAJsFFMsVcibgLlptVhJoMwZeLYg02wCfTLSs ayyryt3wCnkwtAStESYp7cs= =dB6e -END PGP SIGNATURE-
Re: A/R matching against query strings
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I sent the following last month but didn't get any feedback. I'm trying one more time. :) - -M Micah Cowan wrote: On expanding current URI acc/rej matches to allow matching against query strings, I've been considering how we might enable/disable this functionality, with an eye toward backwards compatibility. It seems to me that one usable approach would be to require the ? query string to be an explicit part of rule, if it's expected to be matched against query strings. So -A .htm,.gif,*Action=edit* would all result in matches against the filename portion only, but -A '\?*Action=edit*' would look for Action=edit within the query-string portion. (The '\?' is necessary because otherwise '?' is a wildcard character; [?] would also work.) The disadvantage of that technique is that it's harder to specify that a given string should be checked _anywhere_, regardless of whether it falls in the filename or query-string portion; but I can't think offhand of any realistic cases where that's actually useful. We could also supply a --match-queries option to turn on matching of wildcard rules for anywhere (non-wildcard suffix rules should still match only at the end of the filename portion). Another option is to use a separate -A-like option that does what -A does for filenames, but matches against query strings. I like this idea somewhat less. Thoughts? -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFI/fhT7M8hyUobTrERAgvtAJ0daQEub5GS4EFc7BuGT0pG1E1n0wCgjbnx zb1QK0suZx0woMauqfL0qZI= =5mdh -END PGP SIGNATURE-
Re: A/R matching against query strings
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Tony Lewis wrote: Micah Cowan wrote: On expanding current URI acc/rej matches to allow matching against query strings, I've been considering how we might enable/disable this functionality, with an eye toward backwards compatibility. What about something like --match-type=TYPE (with accepted values of all, hash, path, search)? For the URL http://www.domain.com/path/to/name.html?a=true#content all would match against the entire string hash would match against content path would match against path/to/name.html search would match against a=true For backward compatibility the default should be --match-type=path. I thought about having host as an option, but that duplicates another option. As does path (up to the final /). Would hash really be useful, ever? It's never part of the request to the server, so it's really more context to the URL than a real part of the URL, as far as requests go. Perhaps that sort of thing could best wait for when we allow custom URL-parsers/filters. Also, I don't like the name search overly much, as that's a very limited description of the much more general use of query strings. But differentiating between three or more different match types tilts me much more strongly toward some sort of shorthand, like the explicit need for \?; with three types, perhaps we'd just use some special prefix for patterns to indicate which sort of match we want (:q: query strings, :a: for all, or whatever), to save on prefix each different type of match with --match-type (or just using all for everything). OTOH, regex support is easy enough to add to Wget, now that we're using gnulib; we could just leave wildcards the way they are, and introduce regexes that match everything. Then query strings are '\?.*foo=bar' (or, for the really pedantic, '\?([^?]*)?foo=bar([^?]*)?$') That last one, though, highlights how cumbersome it is to do proper matching against typical HTML form-generated query strings (it's not really even possible with wildcards). Perhaps a more appropriate pattern-matcher specifically for query strings would be a good idea. It's probably enough to do something like --query-='action=Edit', where there's an implied '\?([^?]*)?' before, and '([^?]*)?$' after. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFI/qLZ7M8hyUobTrERAmRdAJsH+9p+mTafoxqeVOstTPKrZP31CACdECCa vQ1lZnncrdHd8SSbXevK02Y= =YC2A -END PGP SIGNATURE-
Re: wget re-download fully downloaded files
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Maksim Ivanov wrote: Hello! Starting version 1.10 wget has very annoying bug: if you trying download already fully downloaded file, wget begin download it over, but 1.9.1 says: Nothing to do as it must to be. It all depends on what options you specify. That's as true for 1.9 as it is for 1.10 (or the current release 1.11.4). It can also depend on the server; not all of them support timestamping or partial fetches. Please post the minimal log that exhibits the problem you're experiencing. - -- Thanks, Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFI8mrL7M8hyUobTrERAqx4AJ9yQb+kPXGI2N7sv34krZLnYDuRvgCfWI2K nZYI8ER1PB3pkYC4neiTa9U= =JW3/ -END PGP SIGNATURE-
Re: Incorrect transformation of newline's symbols
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Александр Вильнин wrote: Hello! I've noticed some posible mistake in ftp-basic.c. When I try to download a file from ftp://www.delorie.com/pub/djgpp/current/; (in my case it was ftp://www.delorie.com/pub/djgpp/current/FILES;) server responce error no.550. But this file actually exists. I've used (wget --verbose --debug --output-file=wget_djgpp_log --directory-prefix=djgpp ftp://www.delorie.com/pub/djgpp/current/FILES;) cygwin command to get this file. In function ftp_request (ftp-basic.c) newline's characters are substituted on ' ', but ftp-server doesn't understand such commands. SIZE and RETR commands do not pass. I've insert debug log at the end of this message. The problem isn't that newlines are substituted. Newlines and carriage returns are simply not safe within FTP file names. However, how did the newline get there in the first place? The real file name itself doesn't have a newline in it. The logs clearly show that Wget was passed a URL with a carriage return (not newline) in it. This strongly indicates that the shell you were using passed it that way to Wget. Probably, the shell was given \r\n when you hit Enter to end your command, and stripped away the \n but left the \r, which it passed to Wget. The bug you are encountering is in your Cygwin+shell environment; you'll have to look to there. The only deficiency I'm seeing on Wget's part from these logs, is that it's calling \015 a newline character, when in fact the newline character is \012; it should say line-ending character or some such. - -- HTH, Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFI67fs7M8hyUobTrERArlfAJ0TurMdyGK0YR9UK263h8p2ZesqXQCfdQo3 Tn4oDFWJg9JIyTEQOJ2jrCE= =Y/Sy -END PGP SIGNATURE-
Re: Support for file://
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 David wrote: Hi Micah, Your're right - this was raised before and in fact it was a feature Mauro Tortonesi intended to be implemented for the 1.12 release, but it seems to have been forgotten somewhere along the line. I wrote to the list in 2006 describing what I consider a compelling reason to support file:// file:///. Here is what I wrote then: At 03:45 PM 26/06/2006, David wrote: In replies to the post requesting support of the file:// scheme, requests were made for someone to provide a compelling reason to want to do this. Perhaps the following is such a reason. I have a CD with HTML content (it is a CD of abstracts from a scientific conference), however for space reasons not all the content was included on the CD - there remain links to figures and diagrams on a remote web site. I'd like to create an archive of the complete content locally by having wget retrieve everything and convert the links to point to the retrieved material. Thus the wget functionality when retrieving the local files should work the same as if the files were retrieved from a web server (i.e. the input local file needs to be processed, both local and remote content retrieved, and the copies made of the local and remote files all need to be adjusted to now refer to the local copy rather than the remote content). A simple shell script that runs cp or rsync on local files without any further processing would not achieve this aim. Fair enough. This example at least makes sense to me. I suppose it can't hurt to provide this, so long as we document clearly that it is not a replacement for cp or rsync, and is never intended to be (won't handle attributes and special file properties). However, support for file:// will introduce security issues, care is needed. For instance, file:// should never be respected when it comes from the web. Even on the local machine, it could be problematic to use it on files writable by other users (as they can then craft links to download privileged files with upgraded permissions). Perhaps files that are only readable for root should always be skipped, or wget should require a --force sort of option if the current mode can result in more permissive settings on the downloaded file. Perhaps it would be wise to make this a configurable option. It might also be prudent to enable an option for file:// to be disallowed for root. https://savannah.gnu.org/bugs/?24347 If any of you can think of additional security issues that will need consideration, please add them in comments to the report. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFI19aE7M8hyUobTrERAt49AJ4irLGMd6OVRWeooKPqZxmX0+K2agCfaq2d Mx9IgSo5oUDQgBPD01mcGcY= =sdAZ -END PGP SIGNATURE-
Re: Support for file://
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Michelle Konzack wrote: Imagine you have a local mirror of your website and you want to know why the site @HOSTINGPROVIDER has some files more or such. You can spider the website @HOSTINGPROVIDER recursiv in a local tmp1 directory and then, with the same commandline, you can do the same with the local mirror and download the files recursive into tmp2 and now you and now you can make a recursive fs-diff and know which files are used... on both, the local mirror and @HOSTINGPROVIDER I'm confused. If you can successfully download the files from HOSTINGPROVIDER in the first place, then why would a difference exist? And if you can't, then this wouldn't be an effective way to find out. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFI1dYe7M8hyUobTrERAuuyAJ9m3ArCqxG4orhAQuEM010yWv6ScwCfaE9h jXIjJ+XUjBYwyBdi8NB/rEY= =NDnR -END PGP SIGNATURE-
Re: Big files
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Cristián Serpell wrote: It is the latest Ubuntu's distribution, that still comes with the old version. Thanks anyway, that was the problem. I know that's untrue. Ubuntu comes with 1.10.2 at least, and has for quite some time. If you're using that, then it's probably a different bug than Doruk and Tony were thinking of (perhaps one of the cases of content-length mishandling that were recently fixed in the 1.11.x series). IIRC Intrepid Ibex (Ubuntu 8.10) will have 1.11.4. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFI0AnI7M8hyUobTrERAqptAJoCj0VC46dBOhrr/A3HsHyicciKWQCffyFQ bHhmuYHmf52Yz1M5lu7Yk5Y= =Z+fN -END PGP SIGNATURE-
Re: Hiding passwords found in redirect URLs
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Thomas Corthals wrote: Micah Cowan wrote: Note: Saint Xavier has already written a fix for this, so it's not actually a question of whether it's worth the bother, just whether it's actually desired behavior. Since it's desired in some situations but maybe not in others, the best solution would be to provide a switch for it that can be used in a user's .wgetrc and on the command line. Well, yes, except I can't really imagining anyone ever _using_ such a switch. Though I could envision people using the .wgetrc option. Still seems like a lot of trouble to make a new option for such a little thing. One could always use -nv in a pinch. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIzBiU7M8hyUobTrERAkchAJ9vajvughHFXR8yAJPPGt4YkaGY8ACfYXCR vPCAZaYsRN6VcisBjDkmdzI= =wMVt -END PGP SIGNATURE-
A/R matching against query strings
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On expanding current URI acc/rej matches to allow matching against query strings, I've been considering how we might enable/disable this functionality, with an eye toward backwards compatibility. It seems to me that one usable approach would be to require the ? query string to be an explicit part of rule, if it's expected to be matched against query strings. So -A .htm,.gif,*Action=edit* would all result in matches against the filename portion only, but -A '\?*Action=edit*' would look for Action=edit within the query-string portion. (The '\?' is necessary because otherwise '?' is a wildcard character; [?] would also work.) The disadvantage of that technique is that it's harder to specify that a given string should be checked _anywhere_, regardless of whether it falls in the filename or query-string portion; but I can't think offhand of any realistic cases where that's actually useful. We could also supply a --match-queries option to turn on matching of wildcard rules for anywhere (non-wildcard suffix rules should still match only at the end of the filename portion). Another option is to use a separate -A-like option that does what -A does for filenames, but matches against query strings. I like this idea somewhat less. Thoughts? - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIyrXz7M8hyUobTrERAk+5AJ0ckiE4+bEMEFe9aD8bBNY3HH+IZACdERCs wab0TyBLCbW/6DYm+8gAExM= =pwb/ -END PGP SIGNATURE-
Hiding passwords found in redirect URLs
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 https://savannah.gnu.org/bugs/index.php?21089 The report originator is copied in the recipients list for this message. The situation is as follows: the user types wget http://foo.com/file-i-want;. Wget asks the HTTP server for the appropriate file, and gets a 302 redirection to the URL ftp://spag:[EMAIL PROTECTED]. Wget will then issue to the log output, the line: Location: ftp://spag:[EMAIL PROTECTED]/mickie/file-you-want with the password in plain view. I'm uncertain that this is actually a problem. In this specific case, it's a publicly-accessible URL redirecting to a password-protected file. What's to hide, really? Of course, the case gets more interesting when it's _not_ a publicly-accessible URL. What about when the password is generated from one the user supplied? That is, the original request was http://spag:[EMAIL PROTECTED]/file-i-want, which resulted in a redirect using the same username/password? Especially if it was an HTTPS request rather than plain HTTP. A case could be made that it should be hidden in that case. On the other hands, in cases like the _original_ example given above, I'd argue that hiding it could be the wrong thing: the user now has no idea how to directly access the file, avoiding the redirect the next time around. Redirecting to a password-protected file on a different host or using a different scheme seems broken to me in the first place, and I'm sorta leaning towards not bothering about it. What are your thoughts, list? Note: Saint Xavier has already written a fix for this, so it's not actually a question of whether it's worth the bother, just whether it's actually desired behavior. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIytyT7M8hyUobTrERAnC1AJ4pRpWx7z6wRt3Vg4LHyQalEfL3XQCdGTqg LdK8lQ8tuPTlmCfURcjXPw4= =ZPrY -END PGP SIGNATURE-
Re: Where is program_name?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Saint Xavier wrote: Hi, * Gisle Vanem ([EMAIL PROTECTED]) wrote: 'program_name' is used in lib/error.c, but it is not allocated anywhere. Should it be added to main.c and initialised to exec_name? $cd wget-mainline $find . -name '*.[ch]' -exec fgrep -H -n 'program_name' '{}' \; ./lib/error.c:63:# define program_name program_invocation_name ^^^ ./lib/error.c:95:/* The calling program should define program_name and set it to the ^^^ Looks to me like we're expected to supply it. Line 63 is only evaluated when we're using glibc; otherwise, we need to provide it. The differing name is probably so we can define it unconditionally. It appears that lib/error.c isn't even _built_ on my system, perhaps because glibc supplies what it would fill in. This makes testing a little dificult. Anyway, see if this fixes your trouble: diff -r 0c2e02c4f4f3 src/ChangeLog - --- a/src/ChangeLog Tue Sep 09 09:29:50 2008 -0700 +++ b/src/ChangeLog Tue Sep 09 09:40:00 2008 -0700 @@ -1,3 +1,7 @@ +2008-09-09 Micah Cowan [EMAIL PROTECTED] + + * main.c: Define program_name for lib/error.c. + 2008-09-02 Gisle Vanem [EMAIL PROTECTED] * mswindows.h: Must ensure stdio.h is included before diff -r 0c2e02c4f4f3 src/main.c - --- a/src/main.cTue Sep 09 09:29:50 2008 -0700 +++ b/src/main.cTue Sep 09 09:40:00 2008 -0700 @@ -826,6 +826,8 @@ exit (0); } +char *program_name; /* Needed by lib/error.c. */ + int main (int argc, char **argv) { @@ -833,6 +835,8 @@ int i, ret, longindex; int nurl, status; bool append_to_log = false; + + program_name = argv[0]; i18n_initialize (); - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIxqf67M8hyUobTrERAq0+AJ9KIOFDn9FiDXIIlU6M7DsupDmPYQCcDuoo 9bgAQnuKpgYMvnwc18svfYg= =DXYi -END PGP SIGNATURE-
Re: Wget and Yahoo login?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Donald Allen wrote: On Tue, Sep 9, 2008 at 3:14 AM, Daniel Stenberg [EMAIL PROTECTED] wrote: On Mon, 8 Sep 2008, Donald Allen wrote: The page I get is what would be obtained if an un-logged-in user went to the specified url. Opening that same url in Firefox *does* correctly indicate that it is logged in as me and reflects my customizations. First, LiveHTTPHeaders is the Firefox plugin everyone who tries these stunts need. Then you read the capure and replay them as closely as possible using your tool. As you will find out, sites like this use all sorts of funny tricks to figure out you and to make it hard to automate what you're trying to do. They tend to use javascripts for redirects and for fiddling with cookies just to make sure you have a javascript and cookie enabled browser. So you need to work hard(er) when trying this with non-browsers. It's certainly still possible, even without using the browser to get the first cookie file. But it may take some effort. I have not been able to retrieve a page with wget as if I were logged in using --load-cookies and Micah's suggestion about 'Accept-Encoding' (there was a typo in his message -- it's 'Accept-Encoding', not 'Accept-Encodings'). I did install livehttpheaders and tried --no-cookies and --header cookie info from livehttpheaders and that did work. That's how I did it as well (except I got the headers from tcpdump); I'm using Firefox 3, so don't have access to FF's new sqllite-based cookies file (apart from the patch at http://wget.addictivecode.org/FrontPage?action=AttachFiledo=viewtarget=wget-firefox3-cookie.patch). Some of the cookie info sent by Firefox was a mystery, because it's not in the cookie file. Perhaps that's the crucial difference -- I'm speculating that wget isn't sending quite the same thing as Firefox when --load-cookies is used, because Firefox is adding stuff that isn't in the cookie file. Just a guess. Probably there are session cookies involved, that are sent in the first page, that you're not sending back with the form submit. - --keep-session-cookies and --save-cookies=foo.txt make a good combination. Is there a way to ask wget to print the headers it sends (ala livehttpheaders)? I've looked through the options on the man page and didn't see anything, though I might have missed it. - --debug - -- HTH, Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIxqL77M8hyUobTrERAovFAJ9yagS2xW+2wFG65BwiFkJNfTMylgCfYaq7 1vOmTDimFg8E7Cn+Q+HGZn8= =JKXH -END PGP SIGNATURE-
Re: Wget and Yahoo login?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Donald Allen wrote: The result of this test, just to be clear, was a page that indicated yahoo thought I was not logged in. Those extra items firefox is sending appear to be the difference, because I included them (from the livehttpheaders output) when I tried sending the cookies manually with --header, I got the same page back with wget that indicated that yahoo knew I was logged in and formatted with page with my preferences. Perhaps you missed this in my last message: Probably there are session cookies involved, that are sent in the first page, that you're not sending back with the form submit. --keep-session-cookies and --save-cookies=foo.txt make a good combination. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIxrJ17M8hyUobTrERAvdsAJ9XEwMfimHXRUXKtV66P+YsG+tA7gCfWKbq nCqAmXJfU3kTncMQkKk0JZo= =17Yr -END PGP SIGNATURE-
Re: Wget and Yahoo login?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Donald Allen wrote: I am doing the yahoo session login with firefox, not with wget, so I'm using the first and easier of your two suggested methods. I'm guessing you are thinking that I'm trying to login to the yahoo session with wget, and thus --keep-session-cookies and --save-cookies=foo.txt would make perfect sense to me, but that's not what I'm doing (yet -- if I'm right about what's happening here, I'm going to have to resort to this). But using firefox to initiate the session, it looks to me like wget never gets to see the session cookies because I don't think firefox writes them to its cookie file (which actually makes sense -- if they only need to live as long as the session, why write them out?). Yes, and I understood this; the thing is, that if session cookies are involved (i.e., cookies that are marked for immediate expiration and are not meant to be saved to the cookies file), then I don't see how you have much choice other than to use the harder method, or else to fake the session cookies by manually inserting them to your cookies file or whatnot (not sure how well that may be expected to work). Or, yeah, add an explicit --header 'Cookie: ...'. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIxrVD7M8hyUobTrERAt19AJ9bmmczCKjzMtGCoXb8B5g25uMLRQCeK8qh M57W3Reqj+/pO8GuDwb9Nok= =ajp/ -END PGP SIGNATURE-
Re: Wget and Yahoo login?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Donald Allen wrote: On Tue, Sep 9, 2008 at 1:41 PM, Micah Cowan [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: Donald Allen wrote: I am doing the yahoo session login with firefox, not with wget, so I'm using the first and easier of your two suggested methods. I'm guessing you are thinking that I'm trying to login to the yahoo session with wget, and thus --keep-session-cookies and --save-cookies=foo.txt would make perfect sense to me, but that's not what I'm doing (yet -- if I'm right about what's happening here, I'm going to have to resort to this). But using firefox to initiate the session, it looks to me like wget never gets to see the session cookies because I don't think firefox writes them to its cookie file (which actually makes sense -- if they only need to live as long as the session, why write them out?). Yes, and I understood this; the thing is, that if session cookies are involved (i.e., cookies that are marked for immediate expiration and are not meant to be saved to the cookies file), then I don't see how you have much choice other than to use the harder method, or else to fake the session cookies by manually inserting them to your cookies file or whatnot (not sure how well that may be expected to work). Or, yeah, add an explicit --header 'Cookie: ...'. Ah, the misunderstanding was that the stuff you thought I missed was intended to push me in the direction of Plan B -- log in to yahoo with wget. Yes; and that's entirely my fault, as I didn't explicitly say that. I understand now. I'll look at trying to make this work. Thanks for all the help, though I can't guarantee that you are done yet :-) But, hopefully, this exchange will benefit others. I was actually surprised you kept going after I pointed out that it required the Accept-Encoding header that results in gzipped content. This behavior is a little surprising to me from Yahoo!. It's not surprising in _general_, but for a site that really wants to be as accessible as possible (I would think?), insisting on the latest browsers seems ill-advised. Ah, well. At least the days are _mostly_ gone when I'd fire up Netscape, visit a site, and get a server-generated page that's empty other than the phrase You're not using Internet Explorer. :p - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIxreZ7M8hyUobTrERAslyAJwKfirhzth9ACgdunxp/rfQlR86mQCcClik 3HbbATyqnrm0hAJXqNTqpl4= =3XD/ -END PGP SIGNATURE-
Re: Hello, All and bug #21793
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 David Coon wrote: Hello everyone, I thought I'd introduce myself to you all, as I intend to start helping out with wget. This will be my first time contributing to any kind of free or open source software, so I may have some basic questions down the line about best practices and such, though I'll try to keep that to a minimum. Anyway, I've been researching unicode and utf-8 recently, so I'm gonna try to tackle bug #21793 https://savannah.gnu.org/bugs/?21793. Hi David, and welcome! If you haven't already, please see http://wget.addictivecode.org/HelpingWithWget I'd encourage you to get a Savannah account, so I can assign that bug to you. Also, I tend to hang out quite a bit on IRC (#wget @ irc.freenode.net), so you might want to sign on there. Since you mentioned an interest in Unicode and UTF-8, you might want to check out Saint Xavier's recent work on IRI and iDNS support in Wget, which is available at http://hg.addictivecode.org/wget/sxav/. Among other things, sxav's additions make Wget more aware of the user's locale, so it might be useful for providing a feature to automatically transcode filenames to the user's locale, rather than just supporting UTF-8 only (which should still probably remain an explicit option). If that sounds like the direction you'd like to take it, you should probably base your work on sxav's repository, rather than mainline. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIxViR7M8hyUobTrERAv/jAJ9/DxAaPaYpdLJojX9gorHn2hqwSACeK7oD veVZAIH2NjbYI8dG6DimjRg= =9Qau -END PGP SIGNATURE-
Re: Wget and Yahoo login?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Donald Allen wrote: There was a recent discussion concerning using wget to obtain pages from yahoo logged into yahoo as a particular user. Micah replied to Rick Nakroshis with instructions describing two methods for doing this. This information has also been added by Micah to the wiki. I just tried the simpler of the two methods -- logging into yahoo with my browser (Firefox 2.0.0.16) and then downloading a page with wget --output-document=/tmp/yahoo/yahoo.htm --load-cookies my home directory/.mozilla/firefox/id2dmo7r.default/cookies.txt 'http://yahoo url' The page I get is what would be obtained if an un-logged-in user went to the specified url. Opening that same url in Firefox *does* correctly indicate that it is logged in as me and reflects my customizations. Are you signing into the main Yahoo! site? When I try to do so, whether I use the cookies or no, I get a message about update your browser to something more modern or the like. The difference appears to be a combination of _both_ User-Agent (as you've done), _and_ --header Accept-Encodings: gzip,deflate. This plus appropriate cookies gets me a decent logged-in page, but of course it's gzip-compressed. Since Wget doesn't currently support gzip-decoding and the like, that makes the use of Wget in this situation cumbersome. Support for something like this probably won't be seen until 1.13 or 1.14, I'm afraid. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIxdw77M8hyUobTrERAi/QAJ0atPMeUQ/0YCNwAP+XiH4nDyvclwCcDxYo obud0CjpATBYDvA0eS3ZHGY= =vv4R -END PGP SIGNATURE-
Re: [wget-notify] add a new option
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 houda hocine wrote: Hi, Hi houda. This message was sent to the wget-notify, which was not the proper forum. Wget-notify is reserved for bug-change and (previously) commit notifications, and is not intended for discussion (though I obviously haven't blocked discussions; the original intent was to be able to discuss commits, but I'm not sure I need to allow discussions any more, so it may be disallowed soon). The appropriate list would be wget@sunsite.dk, to which this discussion has been redirected. we create a new format for archiviving (. warc), and we want to ensure that wget generate directly this format from the input url . You can help me by some ideas to achieve this new option? The format is (warc -wget url) I am in the process of trying to understand the source code to add this new option. Which .c file fallows me to do this? Doing this is not likely to be a trivial undertaking: the current file-output interface isn't really abstracted enough to allow this, so basically you'll need to modify most of the existing .c files. We are hoping at some future point to allow for a more generic output format, for direct output to (for instance) tarballs and .mhtml archives. At that point, it'd probably be fairly easy to write extensions to do what you want. In the meantime, though, it'll be a pain in the butt. I can't really offer much help; the best way to understand the source is to read and explore it. However, on the general topic of adding new options to Wget, Tony Lewis has written the excellent guide at http://wget.addictivecode.org/OptionsHowto. Hope that helps! Please note that I won't likely be entertaining patches to Wget to make it output to non-mainstream archive formats, and even once generic output mechanisms are supported, the mainstream archive formats will most likely be supported as extension plugins or similar, and not as built-in support within Wget. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIvbyf7M8hyUobTrERApl8AJwNvWOdDd0Z//wbNzN/jyZFqKI5iQCfQOx4 3zlxPGaVqjsPhwa7ZwB4wrs= =Zy+N -END PGP SIGNATURE-
Re: Checking out Wget
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 vinothkumar raman wrote: Hi all, I need to checkout the complete source into my local hard disk. I am using WinCVS when i searched for the module its saying that there is no module information out there. Could any one help me out i am a complete novice in this regard. WinCVS won't work, because there _is_ in fact no CVS module for Wget. Wget uses Mercurial as the source repository (and was using Subversion prior to that). For more information about the Wget source repository and its use, see http://wget.addictivecode.org/RepositoryAccess That page focuses on using the hg command-line tool; you may prefer to use TortoiseHg instead, http://tortoisehg.sourceforge.net/. The page does offer additional information about the repository and what is required to build from those sources. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIvb4n7M8hyUobTrERAnquAJ9ItMQH1QYgXvyYTI6/IZDScIFGoACfVlqd p+LMC9AK5/SwYPyuGVfd5Ns= =RmLO -END PGP SIGNATURE-
Re: [BUG:#20329] If-Modified-Since support
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 vinothkumar raman wrote: We need to give out the time stamp the local file in the Request header for that we need to pass on the local file's time stamp from http_loop() to get_http() . The only way to pass on this without altering the signature of the function is to add a field to struct url in url.h Could we go for it? That is acceptable. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIvb5B7M8hyUobTrERAv2YAJ0ajYx+pynFLtV2YmEw7fA+vwf8ugCfSaU1 AFkIYSyyyS4egbyXjzBLXBo= =fIT5 -END PGP SIGNATURE-
Re: [bug #20329] Make HTTP timestamping use If-Modified-Since
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Yes, that's what it means. I'm not yet committed to doing this. I'd like to see first how many mainstream servers will respect If-Modified-Since when given as part of an HTTP/1.0 request (in comparison to how they respond when it's part of an HTTP/1.1 request). If common servers ignore it in HTTP/1.0, but not in HTTP/1.1, that'd be an excellent case for holding off until we're doing HTTP/1.1 requests. Also, I don't think removing the previous HEAD request code is entirely accurate: we probably would want to detect when a server is feeding us non-new content in response to If-Modified-Since, and adjust to use the current HEAD method instead as a fallback. - -Micah vinothkumar raman wrote: This mean we should remove the previous HEAD request code and use If-Modified-Since by default and have it to handle all the request and store pages if it is not returning a 304 response Is it so? On Fri, Aug 29, 2008 at 11:06 PM, Micah Cowan [EMAIL PROTECTED] wrote: Follow-up Comment #4, bug #20329 (project wget): verbatim-mode's not all that readable. The gist is, we should go ahead and use If-Modified-Since, perhaps even now before there's true HTTP/1.1 support (provided it works in a reasonable percentage of cases); and just ensure that any Last-Modified header is sane. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIvb7t7M8hyUobTrERAsvQAJ4k7fKrsFtfC4MQtuvE3Ouwz6LseACePqt2 8JiRBKtEhmcK3schVVO347A= =yCJV -END PGP SIGNATURE-
Re: Support for file://
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Petri Koistinen wrote: Hi, I would be nice if wget would also support file://. Feel free to file an issue for this (I'll mark it Needs Discussion and set at low priority). I'd thought there was already an issue for this, but can't find it (either open or closed). I know this has come up before, at least. I think I'd need some convincing on this, as well as a clear definition of what the scope for such a feature ought to be. Unlike curl, which groks urls, Wget W(eb)-gets, and file:// can't really be argued to be part of the web. That in and of itself isn't really a reason not to support it, but my real misgivings have to do with the existence of various excellent tools that already do local-file transfers, and likely do it _much_ better than Wget could hope to. Rsync springs readily to mind. Even the system cp command is likely to handle things much better than Wget. In particular, special OS-specific, extended file attributes, extended permissions and the like, are among the things that existing system tools probably handle quite well, and that Wget is unlikely to. I don't really want Wget to be in the business of duplicating the system cp command, but I might conceivably not mind file:// support if it means simple _content_ transfer, and not actual file duplication. Also in need of addressing is what recursion should mean for file://. Between ftp:// and http://, recursion currently means different things. In FTP, it means traverse the file hierarchy recursively, whereas in HTTP it means traverse links recursively. I'm guessing file:// should work like FTP (i.e., recurse when the path is a directory, ignore HTML-ness), but anyway this is something that'd need answering. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIvcLq7M8hyUobTrERAl6YAJ9xeTINVkuvl8HkElYlQt7dAsUfHACfXRT3 lNR++Q0XMkcY4c6dZu0+gi4= =mKqj -END PGP SIGNATURE-
Re: How to debug wget ?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Jinhui Li wrote: I am browsing the source code. And want to debug it to figure out how it works. So, somebody please tell me how to debug ( with GDB ) or where can I find information that I need. IMO, GDB is a great tool for diagnosing a particular problem one encounters with a program; it's not all that terribly useful for actually understanding the code itself, though. I find it much quicker to read through the code using a powerful viewer or editor, and making use of tools such as cscope and ctags. The best editors, such as Vim and Emacs, are integrated these tools, and so a simple control-click or key combination can bring up the definition of the function being called or the variable being referenced, or (in the case of cscope) the list of places where a particular function is being called, etc. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIvcPD7M8hyUobTrERAsCEAJ9oQDJWzD/OPAvzvgJorlByd4YqyACfdLM1 GmQUVu/xnQ7HOr493hiWG28= =0XwB -END PGP SIGNATURE-
Corrections to earlier discussion
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi sxav, So, 3.2 is the wrong section to pull from: what we already _have_ are IRIs; we're converting them to URIs. So, section 3.1 applies, not 3.2. The two-step process described by section 3.1 does not allow already-percent-encoded values to be transformed: only international characters will be percent-encoded. In particular, this means that you will not need to distinguish whether a percent-encoded sequence represents a valid UTF-8 character: all percnet-encoded sequences should be passed through the resulting IRI as they appeared originally. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIuGsh7M8hyUobTrERAtSYAJwKZDeb7pCQWq0+XAJNcCZ4Ay0qmACfX3ia ERSpkhiiQsLJ8SdqUSktZLQ= =rF5p -END PGP SIGNATURE-
Re: Corrections to earlier discussion
Micah Cowan wrote: Hi sxav, Er, yeah, that had been meant to go to [EMAIL PROTECTED], not [EMAIL PROTECTED] Whoopsy! :) -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/
Re: Wget function
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 karlito wrote: Hello, First of all i would thank you for your great tool I have a request i use this function to save url with absolute link so it's very good wget -k http://www.google.fr/ but i want to save this file as other name than index.html like for example google-is-good.html i have try this wget -k –output-document=google-is-good.html http://www.google.fr/ is work except i lost absolute link and it's terrible Yeah. Conversions won't work with --output-document, which behaves rather like a shell redirection. i don't know how to fix this problem wich combinaison i have to made for use wget - k with another name ?? You could always rename it afterwards. In your specific case, the current development sources (which will become Wget 1.12) have a --default-page=google-is-good.html option for specifying the default page name, thanks to Joao Ferreira. It's not yet available in any release. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIsv3N7M8hyUobTrERAskoAJ4lHZK+VEBWYuFzOtbd57wEEvYm0wCdEVSK el6v3e0TkKpQtOG2b5ZiHcI= =/+sB -END PGP SIGNATURE-
Re: WGET :: [Correction de texte]
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Tom wrote: Téléchargement récursif: -r, --recursive spécifer un téléchargement récursif. -l, --level=NOMBRE _*profondeeur*_ maximale de récursion (inf ou 0 pour infini). Juste un e à enlever de profondeeur, et ca sera réglé ! This issue appears to have been fixed with the latest French translation. It will be released with Wget 1.12. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIswBE7M8hyUobTrERAufeAKCIl4ghMvo2JolNfsSAYCTd92v9OwCfS89O iT3urRXKctZuucXnOn9tGLc= =v5SC -END PGP SIGNATURE-
Re: [wish] quiet operation yet displaying the progress
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Maciej Pilichowski wrote: Hello, I call usually wget from my script in entirely quiet mode however it would be useful if wget could still show the progress -- currently wget either shows a lot of information (and progress) or does not show anything. In short something like this: --progress=bar -q -nv is understood as -q -nv Please treat such arguments (the former example) as stating show only progress and nothing else. - -q -nv is a nonsensical combination; they say contradictory things. One says to emit only a little output; the other says to emit no output at all. A progress bar for -nv has already been requested, and is tracked at https://savannah.gnu.org/bugs/index.php?22448 I don't mind putting this into 1.12 if someone wants to write the patch; otherwise, I probably won't get to it for some time. I've got some doubts as to whether -nv --progress=bar is the right way to achieve this: is that the behavior we want if the user specified progress=bar in their wgetrc file and then gave the -nv command-line option? Then again, who puts progress=bar in their wgetrc? - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIswSO7M8hyUobTrERAnF7AKCFvdBemlyNzH8aq+QcsdOCFOfAKwCdHBft WADc3rYLGJXpYfgDr/sKS4Q= =gxKn -END PGP SIGNATURE-
Re: Wget function
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Please keep the list in the replies. karlito wrote: hi thank you for the reply my problem can be fixed on the next verssion ? because it's for batch i have more 1000 url to made so is that why i need to find a solution also when you mean rename what is the function to rename with wget ? I mean, just use the mv or rename command on your operating system. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIswfR7M8hyUobTrERAubkAJ0VL2UPnNQtD27waPVwFkeUwbUp9wCfXerh dZBr4e7ZBKcEE5Kzrjv1mi8= =GoKL -END PGP SIGNATURE-
Re: wget and wiki crawling
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 asm c wrote: I've recently been using wget, and got it working for the most part, but there's one issue that's really been bugging me. One of the parameters I use is '-R *action=*,*oldid=*' (side note on the platform: ZSH on NetBSD on the SDF public access unix system, although I've also used it on windows with the same result). The purpose of this parameter is so that, when wget crawls a mid-sized wiki I'd like to have a local copy of, it doesn't bother with all the history pages, edit pages, and so forth. Not downloading these would save me an enormous amount of time. Unfortunately, the parameter is ignored until after the php page is downloaded. So, because it waits until it's downloaded to delete it, using the param doesn't really help at all. Does anyone know how I can stop wget from even downloading matching pages? Well, you don't mention it, but I'll assume that those patterns occur in the query string portion of the URL: that is, they follow a question mark (?) that appears at some point. Unfortunately, the -R and -A options only apply to the filename portion of the URL: that is, whatever falls between the first question mark, and the first preceding slash (/). Confusingly, it is also then applied _after_ files are downloaded, to determine whether they should be deleted after the fact: so Wget probably downloads those files you really wish it wouldn't, and then deletes them afterwards anyway. Worse, there's no way around this, currently. This is part of a suite of problems that are currently slated to be addressed soon. The most pertinent to your problem, though, is the need for a way to match against query strings. I'm very much hoping to get around to this before the next major Wget release, version 1.12. It's being tracked here: https://savannah.gnu.org/bugs/index.php?22089 If you add yourself to the Cc list, you'll be able to follow along on its progress. - -- Cheers! Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIr55d7M8hyUobTrERAu4KAJsHmDTZ46ioEGOTprdE/aTGrj853QCfet84 +c+npJnPwC/86/rLpn5rB8s= =abdv -END PGP SIGNATURE-
Re: Wget and Yahoo login?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Tony Lewis wrote: Micah Cowan wrote: The easiest way to do what you want may be to log in using your browser, and then tell Wget to use the cookies from your browser, using Given the frequency of the login and then download a file use case , it should probably be documented on the wiki. (Perhaps it already is. :-) Yeah, at http://wget.addictivecode.org/FrequentlyAskedQuestions#password-protected I think you missed the final sentence of my how-to: (I'm going to put this up on the Wgiki Faq now, at http://wget.addictivecode.org/FrequentlyAskedQuestions) :) (Back to you:) Also, it would probably be helpful to have a shell script to automate this. I filed the following issue some time ago: https://savannah.gnu.org/bugs/index.php?22561 The report is low on details; but I was envisioning something that would spew out forms and their fields, accept values for fields in one form, and invoke the appropriate Wget command to do the submission. I don't know if it could be _completely_ automated, since it's not 100% possible for the script to know which form fields are the ones it should be filling out. OTOH, there are some damn good heuristics that could be done: I imagine that the right form (in the event of more than one) can usually be guessed by seeing which one has a password-type input (assuming there's also only one of those). If that form has only one text-type input, then we've found the username field as well. Name-based heuristics (with pass, user, uname, login, etc) could also help. If someone wants to do this, that'd be terrific. Could probably reuse the existing HTML parser code from Wget. Otherwise, it'd probably be a while before I could get to it, since I've got higher priorities that have been languishing. Such a tool might also be an appropriate place to add FF3 sqllite cookies support. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIrb0s7M8hyUobTrERAlVXAJ9YnAM7JiQrxrB/KclA1FXDnoVswgCdGO7t Vaa98nhNRuEY4aLMx2BFXm0= =ScoA -END PGP SIGNATURE-
Upcoming Wget releases, issue reorganizations
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 In Savannah, the name of the field value for Planned Release that was previously 1.13 has just been renamed 1.14, and a new 1.13 target has been added. I'll be moving some items currently targeted at 1.12 to 1.13, and some items that have just been moved to 1.14 will get moved to the new 1.13 target. If you have bookmarks to the 1.13 set of bugs in Savannah, that link now goes to 1.14. I've been very happy with the progress and improvements that have been made to Wget over the last several months. My own productivity, though, especially in the last couple of months, was somewhat less than I'd hoped it would be. In particular, taking on co-maintainer responsibilities with GNU Screen, and a brief hiatus to write GNU Teseq (a program to aid in debugging Screen), ate up quite a bit of time. I believe I'm close to stabilizing the balance between my work on Screen and my work on Wget, but I'm behind where I wanted to be. In the meantime, we've already got several really terrifically useful features in the current tree, whose release I'd prefer not to hold back longer than necessary. I may choose to punt some of the improvements I'd been planning on Content-Disposition funkiness and such, and code cleanup, and a bunch of small but not crucial fixes, and really anything else that looks like it might prevent us from releasing near the turn of the year. Steven Schweda's copyright assignment is in for his nice batch of changes for better VMS build-support and myriad FTP-related fixes; I need to sift through a lot of that to see what we can pull in as-is and what I want to adjust somewhat. I'm hoping to get as much of that in for 1.12 as possible - particularly the FTP adjustments, but may need to punt some of it, even important bugfix pieces, until after the 1.12 release. If that's the case, though, I will ensure that http://hg.addictivecode.org/wget/schweda/vms/ is kept up-to-date with mainline, so that it will be essentially functional as a 1.12-plus-Schweda's-changes. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIrfU07M8hyUobTrERAhG7AJ9bv2Q0vetKEcDhfPz2CEQEt+2b3gCeP207 0pu6CNB0sWrsbZqDaWZ7ddA= =0ObC -END PGP SIGNATURE-
Congratulations, GSoC students!
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Well, today was the final pencils down day for the Google Summer of Code program. It's been a great (and quick!) couple of months, and I'm excited by the results. Saint Xavier and Julien Buty have done great work on IRI/IDN support and better HTTP Authentication support. The international stuff will probably be merged into Wget quite soon; the HTTP Authentication project will be continuing for probably the next couple of months, and Julien Buty has enthusiastically volunteered to continue working on it beyond the GSoC program. I really, really appreciate the work that you've done, and hope that you've gained some valuable experience as well (or, at least, a couple of good lines for your CV :) ). Great job, guys! If either of you ever need a recommendation or a reference, don't hesitate to ask. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIqdWg7M8hyUobTrERAt3uAJ92Kh7oSLzVffj5Aaay2xNeOQZbdgCfShKo tIaIz+hlnwP/+2pWQS1e0h8= =BV8L -END PGP SIGNATURE-
Re: AW: AW: AW: Problem mirroring a site using ftp over proxy
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Juon, Stefan wrote: Well, here is the index.html (I'm not sure wheter is also accessible in the maillist as I send it as attachement?) Sorry, I somehow failed to notice this post. :\ The index.html file that the proxy generated is invalid. Apparently it wants to tack on ^M (carriage return, \r) after every filename, as a literal part of the link. It looks like Wget doesn't even acknowledge links like that; but even if it did, it'd send a request to the proxy like: GET /CommonUpdater/avvdat-.zip%0D rather than GET /CommonUpdater/avvdat-.zip so it would still most likely fail to get a real file (though it _might_ work, if the proxy and/or the FTP server are a little sloppy). One likely explanation for this, seems to me, is that the proxy gets back the LIST response like: foo CR LF bar CR LF and removes the LFs while leaving in the CR, and spitting them out as part of the link. That's really poor behavior, considering that FTP servers _ought_ to send CR LF (and not bare LF), as it's supposed to use telnet conventions. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIohiL7M8hyUobTrERApkmAJ9Ia9yvahBPtp0aJDZehKciEMc3vQCgjXSC T9DYFPDUxtBEx6HvOnwBzos= =MAXZ -END PGP SIGNATURE-
Re: WGET :: [Correction de texte]
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Saint Xavier wrote: * Tom ([EMAIL PROTECTED]) wrote: Bonjour ! bonjour, Je souhaite vous informer d'une touche restée appuyée un quart de seconde trop longtemps semble-t-il ! ... Téléchargement récursif: -r, --recursive spécifer un téléchargement récursif. -l, --level=NOMBRE *profondeeur* maximale de récursion (inf ou 0 Juste un e à enlever de profondeeur, et ca sera réglé ! En effet, merci ! Micah, instead of profondeeur it should be profondeur. Where do you forward that info, French GNU translation team ? (./po/fr.po around line 1472) Yup. The mailing address for the French translation team is at [EMAIL PROTECTED] The team page is http://translationproject.org/team/fr.html; other translation teams are listed at http://translationproject.org/team/index.html Looks like it's still present in the latest fr.po file at http://translationproject.org/latest/wget/fr.po - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIoIl77M8hyUobTrERApRkAJsGUybOJEDvYidFXc9OWLJ7gIX66QCeL8we UsjynplN9Um1gmmWUcyZMbU= =lqbw -END PGP SIGNATURE-
Re: Wget and Yahoo login?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Rick Nakroshis wrote: Micah, If you will excuse a quick question about Wget, I'm trying to find out if I can use it to download a page from Yahoo that requires me to be logged in using my Yahoo profile name and password. It's a display of a CSV file, and the only wrinkle is trying to get past the Yahoo login. Try as I may, I just can't seem to find anything about Wget and Yahoo. Any suggestions or pointers? Hi Rick, In the future, it's better if you post questions to the mailing list at wget@sunsite.dk; I don't always have time to respond. The easiest way to do what you want may be to log in using your browser, and then tell Wget to use the cookies from your browser, using - --load-cookies=path-to-browser's-cookies. Of course, this only works if your browser saves its cookies in the standard text format (Firefox prior to version 3 will do this), or can export to that format (note that someone contributed a patch to allow Wget to work with Firefox 3 cookies; it's linked from http://wget.addictivecode.org/, it's unoffocial so I can't vouch for its quality). Otherwise, you can perform the login using Wget, saving the cookies to a file of your choice, using --post-data=..., --save-cookies=cookies.txt, and probably --keep-session-cookies. This will require that you know what data to place in --post-data, which generally requires that you dig around in the HTML to find the right form field names, and where to post them. For instance, if you find a form like the following within the page containing the log-in form: form action=/doLogin.php method=POST input type=text name=s-login input type=password name=s-pass /form then you need to do something like: $ wget --post-data='s-login=USERNAMEs-pass=PASSWORD' \ --save-cookies=my-cookies.txt --keep-session-cookies \ http://HOSTNAME/doLogin.php (Note that you _don't_ necessarily send the information to the page that had the login page: you send it to the spot mentioned in the action attribute of the password form.) Once this is done, you _should_ be able to perform further operations with Wget as if you're logged in, by using $ wget --load-cookies=my-cookies.txt --save-cookies=my-cookies.txt \ --keep-session-cookies ... (I'm going to put this up on the Wgiki Faq now, at http://wget.addictivecode.org/FrequentlyAskedQuestions) - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIn09A7M8hyUobTrERAu04AJ9EgRoBBhvNCDwOt87f91p+HpWktACdFgMM KEfliBtfrPBbh/XdvusEPiw= =qlGZ -END PGP SIGNATURE-
Re: AW: AW: Problem mirroring a site using ftp over proxy
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Juon, Stefan wrote: The point is that wget sends rather a http request than a pure ftp command (GET ftp://ftpde.nai.com/CommonUpdater/ HTTP/1.0) which causes the proxy to send back a index.html. Do u agree? Well of course it does: it's using an HTTP proxy. How do you send FTP commands over HTTP? The problem isn't that the result is an HTML file; the problem is that the proxy sends an HTML file that Wget apparently can't parse. Perhaps the proxy's not really sending an HTML file at all, which would be unusual (but I'm not sure there are standards governing how FTP gets proxied across HTTP), in which case Wget would need to be modified to check whether the proxied results are a listing file. But until you show us what index.html file Wget is getting, I don't see how we can help. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFInJpC7M8hyUobTrERAhGtAJ9/cY3nJk8xf1oWb+KCH8mQ54nXNACgg/is xD3eHrajIfnUDaRhnFI+X+s= =g1QP -END PGP SIGNATURE-
Re: [PATCH] 1.11.4: Add missing $(datarootdir)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Maciej W. Rozycki wrote: Hello, Here is a change that adds $(datarootdir) throughout that has been missed despite the prominent warning output by ./configure. :-( 2008-08-09 Maciej W. Rozycki [EMAIL PROTECTED] * Makefile.in (datarootdir): Add definition. * doc/Makefile.in (datarootdir): Likewise. * src/Makefile.in (datarootdir): Likewise. * tests/Makefile.in (datarootdir): Likewise. Please apply. Maciej Hi Maciej, We're not anticipating any further 1.11.x releases for Wget. Active development for most of the last year has focused on 1.12, which is based on Automake (so we get datarootdir for free). But if any significant bugs are found in 1.11.4 that warrant a new 1.11.x release, we'll add this patch in. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFInM4p7M8hyUobTrERAm6zAJ4sLyVEIkq/VVQ2XKylIKPDrNewSwCfUsIH rFK6XiRKYgVo/yZiU8Nf2iI= =gLU4 -END PGP SIGNATURE-
Connection management and pipelined Wget
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Micah Cowan wrote: * A getter command is mentioned more than once in the above. Note that this is not mutually exclusive with the concept of letting a single process govern connection persistence, which would handle the real work; the getter would probaby be a tool for communicating with the main driver. ... - Using existing tools to implement protocols Wget doesn't understand (want scp support? Just register it as an scp:// scheme handler), and instantly add support to Wget for the latest, greatest protocols without hacking Wget or waiting until we get around to implementing it. Of course, one drawback is that it then becomes difficult to sanely handle a feature for multiple simultaneous connections, or even persistent connections, when outside programs come into play. Using a getter we have control over, that can communicate with a connection-managing program, would allow this to work, but that won't work with outside programs that aren't in the know, such as the scp command, or other getter programs. You can fork multiple scps for multiple connections, but what will keep the number of simultaneous connections to a reasonable limit? Plus, even the idea of our own getter program communicating via a Unix socket or some such to a connections manager program, irks me: it obliterates the independence that makes pipelines useful. I guess, to be useful, a pipelined Wget would need to have wholly independent tools; but the loss of persistent connections would be too great a loss to bear, I think (not that Wget handles them particularly well now: HTTP/1.1 should significantly improve it, though). Still, there were already plans to allow arbitrary content handler commands, and URL filters; we can certainly continue to move in that direction. We could still split off the HTML and CSS parsers as completely autonomous (and interchangeable with alternatives) programs. But it seems to me that content-_fetching_ (protocol support) will need to continue to be fully integrated in Wget's core. Decisions on whether URLs are followed or not could also be outsourced. Previously, I said that we might lose Windows support by making Wget more pipeline-y; but that's not necessarily true. It's just harder to implement in Windows, but can be done. Hell, if need be, we could have Wget write input to a file, then have the parser read it and spit out another file. That's obviously lame, but OTOH it's how Wget already parses HTML currently (except that no additional programs are used). I suspect, though, that such a program would see a Unix-oriented release some time before the Windows port would appear; unless there were ongoing collaboration on a Windows port simultaneous to the Unix-ish development. If in fact everything except for connections could be handled as an external command, then there might be little advantage to be gained by library-izing Wget, and it might make more sense to leaving Wget as a program, and letting connection handlers be plugins (which are expected to use Wget's connection management system, rather than direct connections). Such a project should still probably get a new name (I was going to say be a fork, but it'd probably be a rearchitecture anyway, with little in common to current Wget); Wget proper should continue to be a project that appeals to folks that need a tool that's sufficiently lightweight to install as a core system component, without a lot of fluff (or at least, not too much more fluff than it already has). BTW, I added a couple new name concepts to http://wget.addictivecode.org/Wget2Names: xget (x being the letter after w), and niwt (which I like best so far: Nifty Integrated Web Tools). - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIm2TG7M8hyUobTrERArRCAJwLkozlzfxEDJcJWBQDiHun6KoMfACeMI61 m7NvCrQ7XAIHTuW7Y9+6wCg= =yeUz -END PGP SIGNATURE-
Re: Connection management and pipelined Wget
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Daniel Stenberg wrote: On Thu, 7 Aug 2008, Micah Cowan wrote: niwt (which I like best so far: Nifty Integrated Web Tools). But the grand question is: how would that be pronounced? Like newt? :-) That was my thinking :) - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIm2cl7M8hyUobTrERAt33AJ4xEts7QxviDOjRx7L83fr6QkFwrwCbBXy5 MgYGOL0OJRsg5+IpPEI0djY= =dzkE -END PGP SIGNATURE-
Re: AW: Problem mirroring a site using ftp over proxy
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Well, considering that FTP proxied over HTTP is working fine for me, it's probably more a matter of the index.html file that's generated by the proxy (since one can't do a true LIST over a proxy). Perhaps you could supply the index.html files that are being generated (be sure to clean out any sensitive info first). It might also be informative to know what server program is doing the proxying. - -Micah Juon, Stefan wrote: ...problem exists also with version 1.11.4. So what might cause wget not to download the files as it has performed a LIST? Thanks, Stefan Juon, Stefan wrote: Hi there I'm trying to mirror a ftp site over a proxy (Sun Java Webproxy 4.0.4) using this wget-command: export ftp_proxy=http://proxy.company.com:8080 wget --follow-ftp --passive-ftp --proxy=on --mirror --output-file=./logfile.wget ftp://ftpde.nai.com/CommonUpdater What version of Wget are you running? If it's not the latest, please try the current 1.11.4 release. Please also try the --debug option, to see if Wget gives you more information. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIm2fF7M8hyUobTrERAv/BAJ9biwIIUFaIWZ9Ds7IZxiGAKriA7wCeJtn1 lYdaP8hzodianPg1Bp6b6gk= =+HQo -END PGP SIGNATURE-
Re: WGET Date-Time
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Andreas Weller wrote: Hi! I use wget to download files from a ftp server in a bash script. For example: touch last.time wget -nc ftp://[]/*.txt . find -newer last.time This fails if the files on the FTP server are older than my last.time. So I want wget to set file date/time to the local creation time not the server's... How to do this? You can't, currently. This behavior is intended to support Wget's timestamping (-N) functionality. However, I'd accept a patch for an option that disables this. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIm2si7M8hyUobTrERAi9AAJ0f8TUv7TJR6tFsgc4k174rqH6OlgCghCzz xpemaFdQhODIm0SGp7rJSRA= =vDKD -END PGP SIGNATURE-
Re: Problem mirroring a site using ftp over proxy
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Juon, Stefan wrote: Hi there I'm trying to mirror a ftp site over a proxy (Sun Java Webproxy 4.0.4) using this wget-command: export ftp_proxy=http://proxy.company.com:8080 wget --follow-ftp --passive-ftp --proxy=on --mirror --output-file=./logfile.wget ftp://ftpde.nai.com/CommonUpdater What version of Wget are you running? If it's not the latest, please try the current 1.11.4 release. Please also try the --debug option, to see if Wget gives you more information. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFImVZ77M8hyUobTrERAgS7AJ4lWgDuBJonnms+gkriGTZ7LlA4TwCfeNqo jOtcPq60sVWXb9CA1n6FSnI= =Z/D4 -END PGP SIGNATURE-
Re: Wget scriptability
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Dražen Kačar wrote: Micah Cowan wrote: Okay, so there's been a lot of thought in the past, regarding better extensibility features for Wget. Things like hooks for adding support for traversal of new Content-Types besides text/html, or adding some form of JavaScript support, or support for MetaLink. Also, support for being able to filter results pre- and post-processing by Wget: for example, being able to do some filtering on the HTML to change how Wget sees it before parsing for links, but without affecting the actual downloaded version; or filtering the links themselves to alter what Wget fetches. However, another thing that's been vaguely itching at me lately, is the fact that Wget's design is not particularly unix-y. Instead of doing one thing, and doing it well, it does a lot of things, some well, some not. It does what various people needed. It wasn't an excercise in writing a unixy utility. It was a program that solved real problems for real people. But the thing everyone loves about Unix and GNU (and certainly the thing that drew me to them), is the bunch-of-tools-on-a-crazy-pipeline paradigm, I have always hated that. With a passion. A surprising position from a user of Mutt, whose excellence is due in no small part to its ability to integrate well with other command utilities (that is, to pipeline). The power and flexibility of pipelines is extremely well-established in the Unix world; I feel no need whatsoever to waste breath arguing for it, particularly when you haven't provided the reasons you hate it. For my part, I'm not exaggerating that it's single-handedly responsible for why I'm a Unix/GNU user at all, and why I continue to highly enjoy developing on it. find -name '*.html' -exec sed -i \ 's#http://oldhost/#http://newhost/#g' \; ( cat message; echo; echo '-- '; cat ~/.signature ) | \ gpg --clearsign | mail -s 'Report' [EMAIL PROTECTED] pic | tbl | eqn | eff-ing | troff -ms Each one of these demonstrates the enormously powerful technique of using distinct tools with distinct feature domains, together to form a cohesive solution for the need. The best part is (with the possible exception of the troff pipeline), each of these components are immediately available for use in some other pipeline that does some other completely different function. Note, though, that I don't intend that using Piped-Wget would actually mean the user types in a special pipeline each time he wants to do something with it. The primary driver would read in some config file that would tell wget how it should do the piping. You just tweak the config file when you want to add new functionality. - The tools themselves, as much as possible, should be written in an easily-hackable scripting language. Python makes a good candidate. Where we want efficiency, we can implement modules in C to do the work. At the time Wget was conceived, that was Tcl's mantra. It failed miserably. :-) Are you claiming that Tcl's failure was due to the ability to integrate it with C, rather than its abysmal inadequacy as a programming language (changing it from an ability to integrate with C, to an absolute requirement to do so in order to get anything accomplished)? How about concentrating on the problems listed in your first paragraph (which is why I quoted it)? Could you show us how would a buch of shell tools solve them? Or how would a librarized Wget solve them? Or how would any other paradigm or architecture or whatever solve them? It should be trivially obvious: you plug them in, rather than wait for the Wget developers to get around to implementing it. The thing that both library-ized Wget and pipeline-ized Wget would offer is the same: extreme flexibility. It puts the users in control of what Wget does, rather than just perpetually hearing, sorry, Wget can't do it: you could hack the source, though. :p The difference between the two is that a pipelined Wget offers this flexibility to a wider range of users, whereas a library Wget offers it to C programmers. Or how would you expect to do these things without a library-ized (at least) Wget? Implementing them in the core app (at least by default) is clearly wrong (scope bloat). Giving Wget a plugin architecture is good, but then there's only as much flexibility as there are hooks. Libraryizing Wget is equivalent to providing everything as hooks, and puts the program using it in the driver's seat (and, naturally, there'd be a wrapper implementation, like curl for libcurl). A suite of interconnected utilities does the same, but is more accessible to greater numbers of people. Generally at some expense to efficiency (aren't all flexible architectures?); but Wget isn't CPU-bound, it's network-bound. As mentioned in my original post, this would be a separate project from Wget. Wget would not be going away (though it seems likely to me that it would quickly reach a primarily
Wget scriptability
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Okay, so there's been a lot of thought in the past, regarding better extensibility features for Wget. Things like hooks for adding support for traversal of new Content-Types besides text/html, or adding some form of JavaScript support, or support for MetaLink. Also, support for being able to filter results pre- and post-processing by Wget: for example, being able to do some filtering on the HTML to change how Wget sees it before parsing for links, but without affecting the actual downloaded version; or filtering the links themselves to alter what Wget fetches. The original concept before I came onboard, was plugin modules. After some thought, I'd decided I didn't like this overly much, and have mainly been leading toward the idea of a next-gen Wget-as-a-library thing, probably wrapping libcurl (and with a command-client version, like curl). This obviously wouldn't have been a Wget any more, so would have been a separate project, with a different name. However, another thing that's been vaguely itching at me lately, is the fact that Wget's design is not particularly unix-y. Instead of doing one thing, and doing it well, it does a lot of things, some well, some not. So the last couple days I've been thinking, maybe wget-ng should be a suite of interoperating shell utilities, rather than a library or a single app. This could have some really huge advantages: users could choose their own html-parser to use, they can plug in parsers for whatever filetypes they desire, people who want to implement exotic features can do that... Of course, at this point we're talking about something that's fundamentally different from Wget. Just as we were when we were considering making a next-gen library version. It'd be a completely separate project. And I'm still not going to start it right away (though I think some preliminary requirements and design discussions would be a good idea). Wget's not going to die, nor is everyone going to want to switch to some new-fangled re-envisioning of it. But the thing everyone loves about Unix and GNU (and certainly the thing that drew me to them), is the bunch-of-tools-on-a-crazy-pipeline paradigm, which is what enables you to mix-and-match different tools to cover the different areas of functionality. Wget doesn't fit very well into that scheme, and I think it could become even much more powerful than it already is, by being broken into smaller, more discreet, projects. Or, to be more precise, to offer an alternative that does the equivalent. So far, the following principles have struck me as advisable for a project such as this: - The tools themselves, as much as possible, should be written in an easily-hackable scripting language. Python makes a good candidate. Where we want efficiency, we can implement modules in C to do the work. - While efficiency won't be the highest priority (else we'd just stick to the monolith), it's still important. Spawning off separate processes to each fetch their own page, initiating a new connection each time, would be a lousy idea. So, the architectural model should center around a URL-getter driver, that manages connections and such, reusing persistent ones as much as possible. Of course, there might be distinct commands to handle separate types of URLs, (or alternative methods for handling them, such as MetaLink), and perhaps not all of these would be able to do persistence (a dead-simple way to add support for scp, etc, might be to simply call the command-line program). - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIlEcX7M8hyUobTrERAqvSAJ9rx99xhU7Zo/xwbKXDbWCWp4jSQwCfbbQM zmY9j1zYuGq0eNkZnsqR+Jo= =8wcf -END PGP SIGNATURE-
Re: wget does not like this URL
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Kevin O'Gorman wrote: Is there a reason i get this: [EMAIL PROTECTED] Pending $ wget -O foo http://www.littlegolem.net/jsp/info/player_game_list_txt.jsp?plid=1107gtid=hex; Cannot specify -r, -p or -N if -O is given. Usage: wget [OPTION]... [URL]... [EMAIL PROTECTED] Pending $ While I do have -O, I don't have the ones it seems to think I've specified. Without the -O foo it works fine, but of course puts the results in a different place. I get the same error message if I use the long-form parameter. You most likely have timestamping=on in your wgetrc. -N and -O were disallowed for version 1.11, but were re-enabled for 1.11.3 (I think) with a warning. The latest version of wget is 1.11.4. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIkf9U7M8hyUobTrERAtkfAJ9g84lMEkzSeLn24cWQA805HZmE8wCfV2Ck bB5RK4lRlcBbwOSiU4jPwxM= =K9cv -END PGP SIGNATURE-
Re: propose new feature: loading cookies from Firefox 3.0 cookies.sqlite
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 宋浩 wrote: Hi, folks: I'm currently using Firefox 3.0 on my Ubuntu 8.04 system. The browser saves its cookie file in the same directory as its predecessor Firefox 2.x, but in a SQLite database file called cookies.sqlite instead of a textual file. And I want to add support for this new cookie file format into wget. The coding is almost done. I'd like to know if anyone else is also working on this. To be honest, I'd prefer to avoid a dependency on sqlite in Wget, even a configurable one. I'd much prefer to see a solution based on a separate program that converts from cookies.sqlite to a cookies.txt file. Besides, that solution would work with more tools than Wget (do one thing, and do it well¹). ¹ Not that Wget adheres particularly well to that philosophy... Lest you think I'm just being unfeeling to your needs, I should point out that I'm also running on an Ubuntu 8.04, and have found the sqlite-based cookies files a supreme annoyance. I'd just prefer a more general, scriptable solution. However, if you choose to complete this work (you said you're nearly done), I won't mind if you place a link to your patch on the Wiki front page (http://wget.addictivecode.org/FrontPage). - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIjmZ37M8hyUobTrERApkkAJ9Ns0bt0i7lgCrehQV3Q4RNRYl0eACgiwqR f3tC07+DhuGfI44tPFuaXDE= =ncxt -END PGP SIGNATURE-
Re: wget-1.11.4 bug
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 kuang-cheng chao wrote: Dear Micah: Thanks for your work of wget. There is a question about two wgets run simultaneously. In method resolve_bind_address, wget assumes that this is called once. However, this will cause two domain name with the same ip if two wgets run the same method concurrently. Have you reproduced this, or is this in theory? If the latter, what has led you to this conclusion? I don't see anything in the code that would cause this behavior. Also, please use the mailing list for discussions about Wget. I've added it to the recipients list. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer, and GNU Wget Project Maintainer. http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIiYKF7M8hyUobTrERAr7fAJ0TnkLdEVOMy6wJA3Z1kIYC7dQoMACfZ9hb x5K6MTzhgVRCdKJwUGnbSRw= =EcFC -END PGP SIGNATURE-
Re: wget-1.11.4 bug
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 k.c. chao wrote: Micah Cowan wrote: Have you reproduced this, or is this in theory? If the latter, what has led you to this conclusion? I don't see anything in the code that would cause this behavior. I reproduce this. But I can't make sure the really problem is in resolve_bind_address. In the attached message, both api.yougotphogo.com and farm1.static.flickr.com get the same ip(74.124.203.218). The two wget are called from two threads of a program. Yeah, I get 68.142.213.135 for the flickr.com address, currently. The thing is, though, those two threads should be running wgets under separate processes (I'm not sure how they couldn't be, but if they somehow weren't that would be using Wget other than how it was designed to be used). This problem sounds much more like an issue with the OS's API than an issue with Wget, to me. But we'd still want to work around it if it were feasible. What operating system are you running? Vista? - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer, and GNU Wget Project Maintainer. http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIirT17M8hyUobTrERAjsuAJ0crMPYIQficu1csou8Tt0jDFKvpQCeNYk3 1FhXl3uUYj2IA53qI1oOJ8A= =DvdG -END PGP SIGNATURE-
Re: Patch to allow filtering on content-type header
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Lars Kotthoff wrote: Hi list, I've written a patch which allows filtering on the content-type header to select what is downloaded. E.g. wget -r --content-type=text/* http://www.foobar.com will only download things with a content-type header of text/html, text/plain etc. There's also a content-type-exclude option to not download specific content-types. Sounds great, Lars! In fact, we already have an RFE on the bug-tracker for just such a thing at https://savannah.gnu.org/bugs/?20378; if you'd like to attach it there, that'd be great. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer, and GNU Wget Project Maintainer. http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD4DBQFIirbU7M8hyUobTrERAlUHAJ9pFEOOgspdiYXE54Wg0nD4+e3udgCWMPjM +muSJuWzt8yJwIlTO3oJbQ== =+jBB -END PGP SIGNATURE-
Re: trouble with -p
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Brian Keck wrote: (It also renames diggthis.js to diggthis.js.html, but I don't care about that). That's an indication that the server is misconfigured, and is serving diggthis.js as text/html, rather than text/javascript or text/x-javascript. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer, and GNU Wget Project Maintainer. http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIiD4k7M8hyUobTrERAoJEAJ4q0N4lxfkDoQNtx62QMkGHXxmAlwCeIEdd NKprZGCw4lfMx/jybi/qriM= =Egpr -END PGP SIGNATURE-
Re: Wget
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hor Meng Yoong wrote: Hi: I understand that you are a very busy person. Sorry to disturb you. Hi; please use the mailing list for support requests. I've copied the list in my response. I am using wget to mirror (using ftp://) a user home directory from a unix machine. Wget default to the user's home directory. However, I also need to get /etc folder. So, I tried to use ../../../etc. It works but the result of the ftpped files are in %2E%2E/ %2E%2E/ %2E%2E Any means to overcome this, or rename the directory. Try the -nd option (you may also need -nH). You might prefer to fetch /etc in a separate invocation from the other things; perhaps with the -P option to specify a directory name. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer, and GNU Wget Project Maintainer. http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIhi5O7M8hyUobTrERAl+YAJ9xaX5NivhEfzJLHKD5T3qs0nZuOACgg0eC IqFZMlz8obK+loKyQ6vXCWo= =gNqH -END PGP SIGNATURE-
Re: trouble with -p
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Brian Keck wrote: Hello, If you do wget http://www.ifixit.com/Guide/First-Look/iPhone3G then you get an HTML file called iPhone3G. But if you do wget -p http://www.ifixit.com/Guide/First-Look/iPhone3G then you get a directory called iPhone3G. This makes sense if you look at the links in the HTML file, like /Guide/First-Look/iPhone3G/images/3jYKHyIVrAHnG4Br-standard.jpg But of course I want both. Is there a way of getting wget -p to do something clever, like renaming the HTML file? I've looked through wget(1) /usr/share/doc/wget the comments in the 1.10.2 source without seeing anything relevant. That strikes me as not quite right. If Wget sees http://www.ifixit.com/Guide/First-Look/iPhone3G, and it's not redirected to http://www.ifixit.com/Guide/First-Look/iPhone3G/, then Wget will use a file name. What's more, if it later sees it with the slash, it will fail to create a directory at all, since the file already exists with that pathname. I'm not sure what you mean by I want both. You can't possibly have a regular file named iPhone3G, and another file named iPhone3G/images/... it can't be both a file and a directory at once. If you specify the link with a trailing slash, then Wget will realize iPhone3G is a directory, and will store the file it finds there as iPhone3G/index.html. You're out of luck, though, if some links refer to it with, and some without, the trailing slash, with a server that doesn't redirect to the slash version (like Apache does). - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer, and GNU Wget Project Maintainer. http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIgiPA7M8hyUobTrERAmq8AJ96TyBcrdI0YB06Z2tODRCMSI22AgCggESe jgXOMQ+uNMupbgq0vJZByv0= =jzGB -END PGP SIGNATURE-
Re: trouble with -p
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 James Cloos wrote: Micah == Micah Cowan [EMAIL PROTECTED] writes: Micah I'm not sure what you mean by I want both. He means that, when the -p option is given, he wants to mangle either the created filename or the created directory name so that both do in fact get created on the filesystem and all related files get saved. Perhaps delaying the initial open(2) until after parsing the first document and then pretending that the initial URL had a trailing solidus might work? Not possible with the current architecture. And that wouldn't solve the problem if it happens not to appear that way in the links immediately contained within. https://savannah.gnu.org/bugs/index.php?23756 covers my solution for handling this. The easy workaround for now, though, would be to supply the URL with the solidus in the first place, though as mentioned, I'm not sure that will work if it then later encounters a version without the solidus. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer, and GNU Wget Project Maintainer. http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIgjPS7M8hyUobTrERArzeAJ90f55hIfPc4Rg/+q/mey7fNXQj9ACfV8ZL TNzLJKLVkB2J6EVJcMbwqW4= =jKGB -END PGP SIGNATURE-
Re: rapidshare download problem
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Doruk Fisek wrote: Hi, I'm having trouble cookieless downloading from rapidshare with the latest version of wget. When I use a url like; http://username:[EMAIL PROTECTED]/files/30168760/Rapidshare_EN.txt wget 1.10.2 downloads it just fine but wget 1.11.4 brings an html page instead. See if --auth-no-challenge fixes it for you. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer, and GNU Wget Project Maintainer. http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIfxuC7M8hyUobTrERAt/mAJ97QRCx4mTJKEbSyrql8hsy7Vty3QCeOc5/ GI8fqQaVyLjrx9x/nMgSdNM= =wZbY -END PGP SIGNATURE-
Building [Re: CSS support now in mainline]
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Micah Cowan wrote: I'm pleased to report that the paperwork has been finalized for the assignment of copyright over Ted Mielczarek's CSS support to the FSF. That support has now been merged into the mainline repository, and the separate css repository has been removed. Note that this introduces a new build requirement when building from the repo: flex (or lex) is now required. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer, and GNU Wget Project Maintainer. http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIeRYD7M8hyUobTrERAjAbAKCKhuqSVBoAqiPD82SN/RHIoI7IDwCfehUl EAwy+zpLCUKE86EcjFIHVzE= =9+c7 -END PGP SIGNATURE-
Re: WGET bug...
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 HARPREET SAWHNEY wrote: Hi, I am getting a strange bug when I use wget to download a binary file from a URL versus when I manually download. The attached ZIP file contains two files: 05.upc --- manually downloaded dum.upc--- downloaded through wget wget adds a number of ascii characters to the head of the file and seems to delete a similar number from the tail. So the file sizes are the same but the addition and deletion renders the file useless. Could you please direct me on if I should be using some specific option to avoind this problem? In the future, it's useful to mention which version of Wget you're using. The problem you're having is that the server is adding the extra HTML at the front of your session, and then giving you the file contents anyway. It's a bug in the PHP code that serves the file. You're getting this extra content because you are not logged in when you're fetching it. You need to have Wget send a cookie with an login-session information, and then the server will probably stop sending the corrupting information at the head of the file. The site does not appear to use HTTP's authentication mechanisms, so the [EMAIL PROTECTED] bit in the URL doesn't do you any good. It uses Forms-and-cookies authentication. Hopefully, you're using a browser that stores its cookies in a text format, or that is capable of exporting to a text format. In that case, you can just ensure that you're logged in in your browser, and use the - --load-cookies=cookies.txt option to Wget to use the same session information. Otherwise, you'll need to use --save-cookies with Wget to simulate the login form post, which is tricky and requires some understanding of HTML Forms. - -- HTH, Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer, and GNU Wget Project Maintainer. http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFId9Vy7M8hyUobTrERAjCWAJ9niSjC5YdBDNcAbnBFWZX6D8AO7gCeM8nE i8jn5i5Y6wLX1g3Q2hlDgcM= =uOke -END PGP SIGNATURE-
Re: WGET bug...
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 HARPREET SAWHNEY wrote: Hi, Thanks for the prompt response. I am using GNU Wget 1.10.2 I tried a few things on your suggestion but the problem remains. 1. I exported the cookies file in Internet Explorer and specified that in the Wget command line. But same error occurs. 2. I have an open session on the site with my username and password. 3. I also tried running wget while I am downloading a file from the IE session on the site, but the same error. Sounds like you'll need to get the appropriate cookie by using Wget to login to the website. This requires site-specific information from the user-login form page, though, so I can't help you without that. If you know how to read some HTML, then you can find the HTML form used for posting username/password stuff, and use wget --keep-session-cookies --save-cookies=cookies.txt \ - --post-data='username=foopassword=bar' ACTION Where ACTION is the value of the form's action field, USERNAME and PASSWORD (and possibly further required values) are field names from the HTML form, and FOO and BAR is the username/password. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer, and GNU Wget Project Maintainer. http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFId+w97M8hyUobTrERAmLsAJ91231iGeO/albrgRuuUCRp8zFcnwCgiX3H fDp2J2oTBKlxW17eQ2jaCAA= =Khmi -END PGP SIGNATURE-
CSS support now in mainline
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I'm pleased to report that the paperwork has been finalized for the assignment of copyright over Ted Mielczarek's CSS support to the FSF. That support has now been merged into the mainline repository, and the separate css repository has been removed. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer, and GNU Wget Project Maintainer. http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFId/SU7M8hyUobTrERAt2LAKCPefHXjUjeYnnBtNMFeO5gXvewSgCfbXS4 XzCH+ET6E5zY0BiiBiozdjo= =CzEz -END PGP SIGNATURE-
[RESOLVED] Re: Release: GNU Wget 1.11.4
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Robert Denton wrote: The other guy's trick of sending the unsubscribe request from a different email address worked! Now that I am unsubscribed however, I cannot share that with the list. Would you do the honors for me? Thanks! Robert Thanks, Doug, for pointing that out. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer, and GNU Wget Project Maintainer. http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIamzk7M8hyUobTrERAqKDAJ9VbHOl59PSPtk1rhK8O5HsTx6L7gCfS5Ks IdrWlyK/uidNPeROEnBFEtw= =fieg -END PGP SIGNATURE-
Re: Mailing list migration?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Madhusudan Hosaagrahara wrote: Hi Micah, My suggestion would be to choose the option that minimizes the amount of time and effort required to maintain these lists. What do you think of using an external tool like https://savannah.gnu.org/maintenance/ListServer or offloading mail to 3rd party apps like http://www.google.com/a/help/intl/en/index.html or http://smallbusiness.officelive.com/GetOnline/Domain Last, I'm curious if any attempts have been made to get http://wget.org ~Madhu. The Savannah one I believe is an interface to the existing [EMAIL PROTECTED] (though the latter predates the former). I didn't realize that shell access was a possibility. It's not root, but it's nice to have. I'd probably prefer to use Gnu's over Google. As to wget.org, looks like it's registered to someone in China, I don't think I'm going to spend much effort trying to get it. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer, and GNU Wget Project Maintainer. http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIaxZI7M8hyUobTrERAvOjAJ9g2rF16F/eVZy2+wYD8TnzFa1/NACgiq0m lS9GLfeuINj2m3vt+GCQNpI= =x8L2 -END PGP SIGNATURE-
Release: GNU Wget 1.11.4
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Announcing GNU Wget 1.11.4, a bugfix release. The source code is available at - http://ftp.gnu.org/gnu/wget/ - ftp://ftp.gnu.org/gnu/wget/ Documentation is at - http://www.gnu.org/software/wget/manual/ More information about Wget is on the official GNU web page at http://www.gnu.org/software/wget/, and on the Wget Wgiki, http://wget.addictivecode.org/ Here's the relevant NEWS entries: * Changes in Wget 1.11.4 ** Fixed an issue (apparently a regression) where -O would refuse to download when -nc was given, even though the file didn't exist. ** Fixed a situation where Wget could abort with --continue if the remote server gives a content-length of zero when the file exists locally with content. ** Fixed a crash on some systems, due to Wget casting a pointer-to-long to a pointer-to-time_t. ** Translation updates for Catalan. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer, and GNU Wget Project Maintainer. http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIaGJs7M8hyUobTrERApiRAJ9H7aElziLZ6qQrSHiG4YyaZBSG5wCfaI9J EFfMg67SazmrKekuxvq6zX8= =+IwS -END PGP SIGNATURE-
Re: Release: GNU Wget 1.11.4
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Robert Denton wrote: Hi, I have sent a few emails to: [EMAIL PROTECTED] but they keep bouncing (blocked by SpamAssassin). Is there any other way to get off this list? Thanks! I'm afraid there's nothing we can do here. :\ Please contact [EMAIL PROTECTED] to fix this. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer, and GNU Wget Project Maintainer. http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIaYSE7M8hyUobTrERAqWoAJ9oDhJYj+PnswyaVqkzr/fQK7mukgCfceXO AXQfa37aG2HWyufHVaxqSEs= =ZvWk -END PGP SIGNATURE-
Mailing list migration?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I'm thinking it may be appropriate at this time to broach the subject of a mailing list migration. In the year that I've been the maintainer for GNU Wget, the mailing list at dotsrc.org has gone down twice, for several days at a time, and I'm concerned about whether we might expect further such difficulties in the future. When we do have issues, there's a tendency for responses to be a bit slow. This is understandable, as dotsrc is a small, volunteer-run organization serving the needs of many projects. But it would be nice to have more direct control over the service: for instance, to unsubscribe people when they have trouble doing so themselves (and, perhaps, to ensure that the spam blocker never affects unsubscribe attempts from subscribed addresses). Though it hasn't proven to be a problem yet, I think it would be helpful to have unsubscribe or moderation ability, in the event that some threads or posters get a little out of hand. The downsides, of course, will be the temporary pain of moving to a new address, the potential to lose some subscribers with the move, and moving the current archives over to use the new mailing list. The ideal upsides would be, more reliable service, and more direct control over the subscription list and spam controls. The two possibilities I can think of, are: - Set up a new mailing list at addictivecode.org (my VPS, where the Wiki and source repos are at). The infrastructure is there already (being used for [EMAIL PROTECTED]; there was also a wget-committers list for folks with commit access, which is no longer used). This has the advantage that the Wget maintainer will have root access (so long as it continues to be me ;). The disadvantages are that I may not have the time to spend that a dedicated sysadmin might, and I'm not sure what kind of uptime I can guarantee, as services tend to drop (OOM-killed) when Apache gets hit hard. There are ways around this, but I haven't had time to spend on looking seriously at it. So far, though, my uptimes have been a bit better than dotsrc's, at least. - Use [EMAIL PROTECTED] as the primary mailing list once again, and ask the dotsrc folks to forward wget@sunsite.dk there. This has the advantage that I will have control over the subscription list and various other admin-level things (I hope?), and the GNU admins can probably do a better job (maybe?) than either I or the dotsrc folks can, at keeping services running smoothly. What do y'all think? - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer, and GNU Wget Project Maintainer. http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIaY1v7M8hyUobTrERAgLGAJ0TnlZnNM/25UpibZZEpyr9zJrqxgCgjynV Ap13NbW09sybmsZ7LbTBX/0= =vOS0 -END PGP SIGNATURE-
Re: No downloading
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Sun, Jun 29, 2008 at 1:42 PM, Mishari Almishari [EMAIL PROTECTED] wrote: Hi, I want to download the website www.2006election.net For that, I used the command wget -d -nd -p -E -H -k -K -S -R png,gif,jpg,bmp,ico --ignore-length --user-agent=Mozilla -e robots=off -P www.2006election.net -o www.2006election.net.out http://www.2006election.net; But the downloaded page index.html has no content (except body/head tags), eventhough i can see the content when i used internet exprolorer. mm w wrote: the default index is not named index, or there is a HTTP test server/side regarding HTTP_USER_AGENT The first one could not possibly cause problems, since he's not requesting any URLs with index.html in them. The HTTP_USER_AGENT thing is the problem. Mishari tried to specifically handle this with the --user-agent line, but it apparently wasn't convincing enough. I got it to work with: --user-agent='Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322)' - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer, and GNU Wget Project Maintainer. http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIaAOl7M8hyUobTrERAhldAJ9Ivi2zEQ5MZQ1fIdResHqPDhtnuACgj1Y+ kNGIgq2MS8tPXxkXoKpNVPw= =IhL+ -END PGP SIGNATURE-
Re: No downloading
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Petr Pisar wrote: On 2008-06-29, Mishari Almishari [EMAIL PROTECTED] wrote: Hi, I want to download the website www.2006election.nethttp://www.2006election.net.out/ But the downloaded page index.html has no content (except body/head tags), eventhough i can see the content when i used internet exprolorer. This is not bug, that's feature. All the content you see in IE is generated by JavaScript. See source code of the web page in IE. No, the command he gives literally yields a completely empty web page: html body /body /html - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer, and GNU Wget Project Maintainer. http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIaAcD7M8hyUobTrERAmyUAJ0XSHavTRur8J0eMfk4CY/Ck4p+ngCfa+gU mPn+vwgASK5iPH2J2WTtpWI= =21dD -END PGP SIGNATURE-
Re: Handling Ajax (was Re: No downloading)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Paul King wrote: I just want to de-lurk for a minute. I have been using wget on a regular basis for various websites. If Javascript is responsible for writing the content, then you have a web page that probably uses AJAX, and would be dyanmically updateable. Since Ajax use is on the rise, I wonder if anyone here can say how does wget deal with sites using Ajax? Not so well, generally speaking. Wget isn't going to do any JavaScript-interpreting on it's own, so it really depends. If the JavaScript was written in certain ways, it's possible it will just magically work when you fire it up in your browser. It's not unlikely that it fails miserably. :\ Ultimately, I think it depends on the site. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer, and GNU Wget Project Maintainer. http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIaBoR7M8hyUobTrERAszlAJ9nf8WyaMYFuu2+hNgn8hLCfBzMBgCdGAZL DD0EfFfeyCxV7MiRw8eVHMs= =LGpk -END PGP SIGNATURE-
Re: Wget 1.11.3 - case sensetivity and URLs
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Tony Lewis wrote: Coombe, Allan David (DPS) wrote: However, the case of the files on disk is still mixed - so I assume that wget is not using the URL it originally requested (harvested from the HTML?) to create directories and files on disk. So what is it using? A http header (if so, which one??). I think wget uses the case from the HTML page(s) for the file name; your proxy would need to change the URLs in the HTML pages to lower case too. My understanding from David's post is that he claimed to have been doing just that: I modified the response from the web site to lowercase the urls in the html (actually I lowercased the whole response) and the data that wget put on disk was fully lowercased - problem solved - or so I thought. My suspicion is it's not quite working, though, as otherwise where would Wget be getting the mixed-case URLs? - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer, and GNU Wget Project Maintainer. http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIYVyq7M8hyUobTrERAo6mAJ4ylEi5qUZqE7DR8xL2XjWOSfuurACePrIz Vl7REl1hNVNqdBrLqoygrcE= =jlBN -END PGP SIGNATURE-
Re: Wget 1.11.3 - case sensetivity and URLs
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Coombe, Allan David (DPS) wrote: OK - now I am confused. I found a perl based http proxy (named http::proxy funnily enough) that has filters to change both the request and response headers and data. I modified the response from the web site to lowercase the urls in the html (actually I lowercased the whole response) and the data that wget put on disk was fully lowercased - problem solved - or so I thought. However, the case of the files on disk is still mixed - so I assume that wget is not using the URL it originally requested (harvested from the HTML?) to create directories and files on disk. So what is it using? A http header (if so, which one??). I think you're missing something on your end; I couldn't begin to tell you what. Running with --debug will likely be informative. Wget uses the URL that successfully results in a file download. If the files on disk have mixed case, then it's because it was the result of a mixed-case request from Wget (which, in turn, must have either resulted from an explicit argument, or from HTML content). The only exception to the above is when you explicitly enable - --content-disposition support, in which case Wget will use any filename specified in a Content-Disposition header. Those are virtually never issued, except for CGI-based downloads (and you have to explicitly enable it). - -- Good luck! Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer, and GNU Wget Project Maintainer. http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIXe0Z7M8hyUobTrERAkF5AJ9FOkx5XQJCx9vkTV9xr2zbYzp4jwCffrec zhdtjp59GOwt07YgvtolM8o= =FZ3m -END PGP SIGNATURE-
Re: help with accessing Google APIs
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Ryan Schmidt wrote: On Jun 20, 2008, at 4:47 PM, [EMAIL PROTECTED] wrote: I get the following error: --17:42:58-- http://ajax.googleapis.com/ajax/services/search/web?v=1.0 = [EMAIL PROTECTED]' Resolving ajax.googleapis.com... 66.102.1.100, 66.102.1.101, 66.102.1.102, ... Connecting to ajax.googleapis.com|66.102.1.100|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 81 [text/javascript] 0K 100% 6.79 MB/s 17:42:58 (6.79 MB/s) - [EMAIL PROTECTED]' saved [81/81] 'q' is not recognized as an external or internal command, operable program or batch file. Your shell appears to think the in the URL has not been escaped. I'm not sure why it thinks that, since you've enclosed it in single quotes, which should be sufficient. And copying and pasting your command to my terminal (replacing curl with wget) works for me: The fact that Wget transcodes '?' to '@' is a pretty good sign the user is running Windows, so I'm going to assume that. In that case, AIUI, single-quotes don't work the same as they would in a Unix shell: the user needs to use double-quotes instead. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer, and GNU Wget Project Maintainer. http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIXCp47M8hyUobTrERAo2sAJ9+VnsSA74BA9AmfLHqu++TTAgiPACgirot EfLt/jBNKruR8sI/2M/724E= =gUSE -END PGP SIGNATURE-
Re: Does --page-requisites load content from other hosts?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Stefan Nowak wrote: Does --page-requisites load content from other hosts as well, or must I explicitly issue a --span-hosts with it? The manpage unambiguously says about --span-hosts Enable spanning across hosts when doing recursive retrieving, but at the --span-hosts section it does not mention whether wget will load from other hosts or only the mother host. Please reply in CC to me, and also update the manpage with the information. - --page-requisites invokes a special kind of recursion (and the manpage says this), so the manpage is pretty clear about what's required (i.e., yes, you need --span-hosts, just as you would for -r). The manual makes this even more clear in the following text: Actually, to download a single page and all its requisites (even if they exist on separate websites), and make sure the lot displays properly locally, this author likes to use a few options in addition to -p: wget -E -H -k -K -p http://site/document (Note, btw, that the authoritative source for information about Wget is the info manual, not the man page.) - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer, and GNU Wget Project Maintainer. http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIWpve7M8hyUobTrERAihmAJ0Sm0uNTn6WBH69qvmtAUuSZ7n9awCfSZL6 4B0EpM/EaLptbHDM70cJJyo= =x7G7 -END PGP SIGNATURE-
Re: wget doesn't load page-requisites from a) dynamic web page b) through https
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Ryan Schmidt wrote: For example, if you want American English, set LANG to en_US. In the Bash shell, you can type export LANG=en_US In the Tcsh shell, you can type setenv LANG en_US To find out which shell you use, type echo $SHELL FYI: It's not in any current release, but current mainline has support for the special [EMAIL PROTECTED] for LANGUAGE (still may need to set LANG=en_US or something). This causes all quoted strings to be rendered in boldface, using terminal escape sequences. I've found it pleasant to use that setting for my own purposes. The [EMAIL PROTECTED] LANGUAGE setting is also supported (converts to proper left/right-quotemarks, but no terminal sequences); but I've rigged LANG=en_US to have the same effect ([EMAIL PROTECTED] is copied to en_US.po). Again, this is only in the mainline repo, and not in any release. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer, and GNU Wget Project Maintainer. http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIWXvT7M8hyUobTrERAmedAJ44nMxqJCyIBox1LDv/FOibkCslIACeLoS3 Beb0toZwvx29J4Sa3AZk62k= =Sreb -END PGP SIGNATURE-
Re: Help with a core dump please
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Valentin wrote: Hi, I'm trying to mirror a site with this command: wget -nd -r -k -p -H -c -T 10 -t 2 http://www.freesfonline.de/Magazines1.html It works fine, until at some point it tries to get http://www.booksense.com/robots.txt and core dumps. When I try downloading just that file there's no problem. Is there some way to increase wget's verbosity or another way of debugging this? I have version 1.10.2-3ubuntu1. FYI, Valentin caught me online for IRC. It looked like a problem we'd fixed in 1.11, but actually, it's still present, and a new bug report has been filed: https://savannah.gnu.org/bugs/?23613 - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer, and GNU Wget Project Maintainer. http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIVyiV7M8hyUobTrERAuTaAJ9a61N6txpABBVhizVKYEiAiVVHQgCeP8sY TZe7Qpww5ejINO60c2A9QxM= =88jF -END PGP SIGNATURE-
Re: bug in wget
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Sir Vision wrote: Hello, enterring following command results in an error: --- command start --- c:\Downloads\wget_v1.11.3bwget ftp://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/latest-mozilla1.8-l10n/; -P c:\Downloads\ --- command end --- wget cant convert .listing-file into a html-file As this seems to work fine on Unix, for me, I'll have to leave it to the Windows porting guy (hi Chris!) to find out what might be going wrong. ...however, it would really help if you would supply the full output you got, from wget, that leads you to believe Wget couldn't do this conversion. in fact, it wouldn't hurt to supply the -d flag as well, for maximum debugging messages. - -- Cheers, Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer, and GNU Wget Project Maintainer. http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIVKXx7M8hyUobTrERAo40AKCAmwgDOGgjU2kcTYeEGC3+RkCjzQCeJt6B dz38DW8jMMZtUxc+FhvIhfI= =T+mK -END PGP SIGNATURE-
Re: Wget 1.11.3 - case sensetivity and URLs
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Tony Lewis wrote: Micah Cowan wrote: Unfortunately, nothing really comes to mind. If you'd like, you could file a feature request at https://savannah.gnu.org/bugs/?func=additemgroup=wget, for an option asking Wget to treat URLs case-insensitively. To have the effect that Allan seeks, I think the option would have to convert all URIs to lower case at an appropriate point in the process. I think you probably want to send the original case to the server (just in case it really does matter to the server). If you're going to treat different case URIs as matching then the lower-case version will have to be stored in the hash. The most important part (from the perspective that Allan voices) is that the versions written to disk use lower case characters. Well, that really depends. If it's doing a straight recursive download, without preexisting local files, then all that's really necessary is to do lookups/stores in the blacklist in a case-normalized manner. If preexisting files matter, then yes, your solution would fix it. Another solution would be to scan directory contents for the first name that matches case insensitively. That's obviously much less efficient, but has the advantage that the file will match at least one of the real cases from the server. As Matthias points out, your lower-case normalization solution could be achieved in a more general manner with a hook. Which is something I was planning on introducing perhaps in 1.13 anyway (so you could, say, run sed on the filenames before Wget uses them), so that's probably the approach I'd take. But probably not before 1.13, even if someone provides a patch for it in time for 1.12 (too many other things to focus on, and I'd like to introduce the external command hooks as a suite, if possible). OTOH, case normalization in the blacklists would still be useful, in addition to that mechanism. Could make another good addition for 1.13 (because it'll be more useful in combination with the rename hooks). - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer, and GNU Wget Project Maintainer. http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIUua+7M8hyUobTrERAr0tAJ98A/WCfPNhTOQ3Xcfx2eWP2stofgCcDUUQ nVYivipui+0TRmmK04kD2JE= =OMsD -END PGP SIGNATURE-
Re: Wget 1.11.3 - case sensetivity and URLs
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi Allan, You'll generally get better results if you post to the mailing list (wget@sunsite.dk). I've added it to the recipients list. Coombe, Allan David (DPS) wrote: Hi Micah, First some context… We are using wget 1.11.3 to mirror a web site so we can do some offline processing on it. The mirror is on a Solaris 10 x86 server. The problem we are getting appears to be because the URLs in the HTML pages that are harvested by wget for downloading have mixed case (the site we are mirroring is running on a Windows 2000 server using IIS) and the directory structure created on the mirror have 'duplicate' directories because of the mixed case. For example, the URLs in HTML pages /Senate/committees/index.htm and /senate/committees/index.htm refer to the same file but wget creates 2 different directory structures on the mirror site for these URLs. This appears to be a fairly basic thing, but we can't see any wget options that allow us to treat URLs case insensetively. We don't really want to post-process the site just to merge the files and directories with different case. Unfortunately, nothing really comes to mind. If you'd like, you could file a feature request at https://savannah.gnu.org/bugs/?func=additemgroup=wget, for an option asking Wget to treat URLs case-insensitively. Finding local files case-insensitively, on a case-sensitive filesystem, would be a PITA; but adding and looking up URLs in the internal blacklist hash wouldn't be too hard. I probably wouldn't get to that for a while, though. Another useful option might be to change the name of index files, so that, for instance, you could have URLs like http://foo/ result in foo/index.htm or foo/default.html, rather than foo/index.html. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer, and GNU Wget Project Maintainer. http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIUG937M8hyUobTrERAqq2AJ48mGvcFCSxnouTFqYTuRHzVgwYdgCeLegI vkdzf3Lu+Vn5diCOHk5CRhc= =IlG9 -END PGP SIGNATURE-
Re: FW: GNU Coding Standard compliance
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Chris, Wouldn't the Cygwin-specific version be preferable to existing Cygwin users (which would include me, on occasion)? In particular, things like the default --restrict=windows setting might be less than desirable. I imagine most Cygwin users are looking for a Windows Wget that behaves more like Unix Wget (since Cygwin is essentially Posix for Windows). There's also the fact that Wget-1.10.2 is already a Cygwin package, which could do with the updating, and is probably important to the Cygwin set as a whole. Also, I'm not sure that Eric is subscribed to the ML, so he may not have gotten your message (I've added him to the recipients). - -Micah Christopher G. Lewis wrote: Eric - Why are you trying to package Wget for cygwin when there is a *native* win32 exe? Seems like a whole *lot* of work for something that really doesn't gain you anything. I'm quite interested in your response. Chris Christopher G. Lewis http://www.ChristopherLewis.com -Original Message- From: Eric Blake [mailto:[EMAIL PROTECTED] Sent: Wednesday, June 04, 2008 7:52 AM To: [EMAIL PROTECTED] Subject: GNU Coding Standard compliance I'm trying to package wget-1.11.3 for cygwin. But you have several GNU Coding Standard compliance problems that is making this task more difficult than it should be. GCS requires that your testsuite be run by 'make check', but yours is a no-op. Instead, you provide 'make test', but that fails to compile if you use a VPATH build. And even when using an in-tree build, it fails as follows: ./Test-proxied-https-auth.px echo echo /bin/sh: ./Test-proxied-https-auth.px: No such file or directory After commenting that line out, the following tests are also missing: ./Test-proxy-auth-basic.px ./Test-N-current-HTTP-CD.px Test-N-HTTP-Content-Disposition.px fails, since it didn't add the --content-disposition flag to the wget invocation. Several Test--spider-* tests fail, because an expected error code of 256 is impossible (exit status is truncated to 8 bits). Also, your hand-rolled Makefile.in don't support --datarootdir. I'm not sure whether you are interested in migrating to using Automake, which would solve a number of these issues; let me know if you would be interested in such a patch. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer, and GNU Wget Project Maintainer. http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFITZth7M8hyUobTrERApkTAJ95Xll+H1vZaMYtrBRgRGedFUGP1QCZAVeP JPBle23eqa0JpuCIdX37c6U= =yjZr -END PGP SIGNATURE-
Re: FW: GNU Coding Standard compliance
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Micah Cowan wrote: Also, I'm not sure that Eric is subscribed to the ML, so he may not have gotten your message (I've added him to the recipients). (Obviously, this was rectified, and well before I wrote this. However, it apparently took four days for either of these messages to be delivered to sunsite.dk from gnu.org, so I hadn't gotten the fixed version before I sent this.) Hm, according to http://dotsrc.org/, their servers were down, so we may be catching up for a bit here as delayed mails are retried over the next couple days. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer, and GNU Wget Project Maintainer. http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFITZ0n7M8hyUobTrERAjXOAJ9YCcwXz+gmC4wEjIj8wmF5ggpLSACcD+hA hvDA5+9BLJH9qIXaB2QHJoA= =zjc9 -END PGP SIGNATURE-
Re: getpass alternative [Re: getpass documentation]
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Matthew Woehlke wrote: Micah Cowan wrote: Wget just added code to the dev repo that uses gnu_getpass to support password prompting. This was mainly because it was quick-and-easy, and because gnu_getpass doesn't suffer from many of the serious flaws plaguing alternative implementations. Hehe, earlier today I merged my old, lame-but-functional patch with 1.11.3 (I've changed systems since last time). Does this mean that when fedora picks up 1.12 (after there *is* a 1.12 obviously :-) ) that I won't need to roll my own any more? ;-) That's the plan. I guess you're quoting from my post to the gnulib list? :) For those not in the loop, the context of the thread above is discussion of a more general password-getting solution, for folks that don't need something that adheres to the (formerly) standard Unix getpass interface. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer, and GNU Wget Project Maintainer. http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIRjOl7M8hyUobTrERAr85AJ922yvYJuNz6ZCB3isah8kwguWSnwCeKKcA rOm0SJPErGDtt7VaLgg9J5w= =7Mz/ -END PGP SIGNATURE-
Re: GNU Coding Standard compliance
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Eric Blake wrote: I'm trying to package wget-1.11.3 for cygwin. But you have several GNU Coding Standard compliance problems that is making this task more difficult than it should be. GCS requires that your testsuite be run by 'make check', but yours is a no-op. Instead, you provide 'make test', but that fails to compile if you use a VPATH build. And even when using an in-tree build, it fails as follows: ./Test-proxied-https-auth.px echo echo /bin/sh: ./Test-proxied-https-auth.px: No such file or directory After commenting that line out, the following tests are also missing: ./Test-proxy-auth-basic.px ./Test-N-current-HTTP-CD.px Test-N-HTTP-Content-Disposition.px fails, since it didn't add the --content-disposition flag to the wget invocation. Several Test--spider-* tests fail, because an expected error code of 256 is impossible (exit status is truncated to 8 bits). Also, your hand-rolled Makefile.in don't support --datarootdir. I'm not sure whether you are interested in migrating to using Automake, which would solve a number of these issues; let me know if you would be interested in such a patch. We actually have already migrated to Automake in the mainline revision, which we forked some time ago. 1.11.x development has focused on important bugfixes only. The issues with the tests are known, and documented (see tests/README). They are provided as-is; a work-in-progress, and not really expected to be terribly useful. I'm actually working on improving this process right now (and in fact, the current mainline is already much-improved in this regard, thanks to some recent commits). In the mainline repository, make check works as expected (modulo some remaining issues with the tests, such as intermittent failures due to the fact that all the tests use the same web-server port for testing, and don't always wait quite long enough for reuse; I'll have that fixed soon). I would definitely recommend that make test be abandoned altogether; alternatively, you could probably modify tests/Makefile.in to match current mainline, which now runs a run-px script, rather than all those hideous ./Test-foo.px echo echo lines in Makefile.in proper (the tests from mainline should run fine on 1.11.3, I believe). It would still need some work, as I mention, to really be reliable, but at least there aren't glaring issues with broken and missing tests (and it runs via the expected make target). Good luck with the packaging. - -- HTH, Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer, and GNU Wget Project Maintainer. http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIRsTS7M8hyUobTrERApudAJ9ugo0WsAL/gkJud1fK4Ip3+vDFSgCeMGvQ XFuwWhMOlGdeOx90BGoWyOA= =q4wa -END PGP SIGNATURE-
Re: getpass alternative [Re: getpass documentation]
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Matthew Woehlke wrote: Micah Cowan wrote: Matthew Woehlke wrote: Micah Cowan wrote: Wget just added code to the dev repo that uses gnu_getpass to support password prompting. This was mainly because it was quick-and-easy, and because gnu_getpass doesn't suffer from many of the serious flaws plaguing alternative implementations. Hehe, earlier today I merged my old, lame-but-functional patch with 1.11.3 (I've changed systems since last time). Does this mean that when fedora picks up 1.12 (after there *is* a 1.12 obviously :-) ) that I won't need to roll my own any more? ;-) That's the plan. Great news, thanks! Now... if Fedora will just pick it up... :-) I don't see why they wouldn't; they're up-to-date with 1.11.3, as of today, which seems like a pretty quick pick-up, to me. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer, and GNU Wget Project Maintainer. http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIRsjk7M8hyUobTrERAgXYAJ9jTqhrbAVZfl5//f7cFjzpw2rohACfYVXu Sj9P/t1lD/1S2wQ0uaukLgc= =mJsC -END PGP SIGNATURE-
Re: mail-archive.com archive ends at 2008-04-07
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Micah Cowan wrote: Micah Cowan wrote: Ryan Schmidt wrote: The wget site [1] lists two sites [2] [3] hosting the mailing list archives. The gmane.org archive is current but the mail-archive.com site only has messages up through April 7, 2008. Any idea how to get that archive up to date again? Hm, you're right; I'd been relying mainly on the gmane one, so hadn't noticed. I suppose the staff should be contacted about that. This may be relevant: http://www.mail-archive.com/[EMAIL PROTECTED]/msg01261.html Not sure it explains the 1½ months of missing mails, but it sounds like maybe I should wait a couple days before worrying about it. So, it turns out the archive address for mail-archive.com was unsubscribed from wget@sunsite.dk, due to message bounces. It should be back on now. I'm not sure about the missing month or so of messages, though. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer, and GNU Wget Project Maintainer. http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIQy8D7M8hyUobTrERAovBAJ9k5jQjOIo/JZjB6I9Jf8nsY+ZziACghdqO RnjUMT1ePtVlHk1sDpSxl6g= =hEBA -END PGP SIGNATURE-
Re: About Automated Unit Test for Wget
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Micah Cowan wrote: Yeah. But we're not doing streaming. And you still haven't given much explanation for _why_ it's as hard and time-consuming as you say. Making a claim and demonstrating it are different things, I think. To be clear, I'm not trying to say, I don't believe you; I'm saying, argue the case, please, don't just make assertions. Clearly, you're concerned about something I'm unable to see: help me to see it! If I ignore your warnings, and wind up running headlong into what you saw in the first place, you can't claim you gave fair warning if you didn't provide examples of what I might run into. For my part, I see something which, at least for first cut, I could whip up in a couple of hours (the server emulation and associated state-tracking, of course, would be _quite_ a bit more work). What is it that causes our two perspectives to differ so wildly? - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer, and GNU Wget Project Maintainer. http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFH+KfI7M8hyUobTrERAt4YAKCKSfG/1HtV29mm1MSdDyzFuS8lRQCfdVla EIpSSdKhguieVxgYXln+XiQ= =mMj2 -END PGP SIGNATURE-
Re: About Automated Unit Test for Wget
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Yoshihiro Tanaka wrote: 2008/4/5, Yoshihiro Tanaka [EMAIL PROTECTED]: Yes, since I want to write proposal for Unit testing, I can't skip this problem. But considering GSoC program is only 2 month, I'd rather narrow down the target - to gethttp funcion. I have a sneaking suspicion that some chunks of functionality that you'd want to farm out in gethttp, also have code-change repurcussions elsewhere (probably http_loop usually). So it may be difficult to restrict yourself to gethttp. :) Probably better to identify the specific chunks of logic that can be farmed out, find out how far-reaching separating those chunks might be, and choose some specific ones to do. You've already identified some areas; I'll comment those when I have a chance to look more closely at the code, for comparison with your remarks. In addition to above, we have to think about abstraction of network API and file I/O API. But network API(such as fd_read_body, fd_read_hunk) exists in retr.c, and socket is opened in connect.c file, it looks that abstraction of network API would require major modification of interfaces. Or did you mean to write wget version of socket interface? i.e. to write our version of socket, connect,write,read,close,bind, listen,accept,,,? sorry I'm confused. Yes! That's what I meant. (Except, we don't need listen, accept; and we only need bind to support --bind-address. We're a client, not a server. ;) ) It would be enough to write function-pointers for (say), wg_socket, wg_connect, wg_sock_write, wg_sock_read, etc, etc, and point them at system socket, connect, etc for real Wget, but at wg_test_socket, wg_test_connect, etc for our emulated servers. This would mean we'd need to separate uses of read() and write() on normal files (which should continue to use the real calls, until we replace them with the file I/O abstractions), from uses of read(), write(), etc on sockets, which would be using our emulated versions. Ideally, we'd replace the use of file descriptor ints with a more opaque mechanism; but that can be done later. If you'd prefer, you might choose to write a proposal focusing on the server emulation, which would easily take up a summer of itself (and then some); particularly when you realize that we would need a file format describing the virtual server's state (what domains and URLs exist, what sort of headers it should respond with to certain requests, etc). If you chose to take on, you'd probably need to settle for a subset of the final expected product. Note that, down the road, we'll want to encapsulate the whole sockets-layer abstraction into an object we'd pass around as an argument (struct net_connector * ?), as we might want to use it to handle SOCKS for some URLs, while using direct connections for others. But that doesn't have to happen right now; once we've got the actual abstraction done it should be pretty easy to move it to an object-based mechanism (just use conn-connect(...) instead of wg_connect(...)). But, if you want to go ahead and do that now, that'd be great too. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer, and GNU Wget Project Maintainer. http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFH9+7p7M8hyUobTrERApu6AKCENiEExoyTHxDUodnr/AIcRx8BOgCcD89N k6ANTdl+4fgb+4trcADXnO0= =fmya -END PGP SIGNATURE-
Re: About Automated Unit Test for Wget
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Daniel Stenberg wrote: On Sat, 5 Apr 2008, Micah Cowan wrote: Or did you mean to write wget version of socket interface? i.e. to write our version of socket, connect,write,read,close,bind, listen,accept,,,? sorry I'm confused. Yes! That's what I meant. (Except, we don't need listen, accept; and we only need bind to support --bind-address. We're a client, not a server. ;) ) Except, you do need listen, accept and bind in a server sense since even if wget is a client I believe it still supports the PORT command for ftp... Damn FTP... :) Yeah, of course. Sorry, my view of the web tends frequently to be very HTTP-colored. :) (Well, technically, that _is_ the WWW, but anyway...) - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer, and GNU Wget Project Maintainer. http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFH+ENm7M8hyUobTrERAlewAJ9W+vriWeVptJWG72Q3F0Njpt9TZgCfeZI4 An3zovMEfIEd1W1o7hqe5q0= =TKsW -END PGP SIGNATURE-
Re: About Automated Unit Test for Wget
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Daniel Stenberg wrote: In the curl project we took a simpler route: we have our own dumb test servers in the test suite to run tests against and we have single files that describe each test case: what the server should respond, what the protocol dump should look like, what output to expect, what return code, etc. Then we have a script that reads the test case description, fires up the correct server(s), verifies all the ouputs (optionally using valgrind). This system allows us to write unit-tests if we'd like to, but mostly so far we've focused to test it system-wide. It is hard enough for us! Yeah, I thought I'd seen something like that; I was thinking we might even be able to appropriate some of that, if that looked doable. Except that I preferred faking the server completely, so I could deal better with cross-site issues, which AFAICT are significantly more important to Wget than they are to Curl. I was thinking, and should have said, that if we go this route, we'd want to focus on high-level tests first. That also has the advantage that if we accidentally change something during the refactoring process (not unlikely), we will notice it, whereas focusing just on unit tests would mean we'd have to change the code to be testable in units _before_ verification. We already _do_ have some spawn-a-server tests code, but much of it needs rewriting, and it still suffers when you bring in the idea of multiple servers. The servers are driven by Perl code, rather than a driver script or description file. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer, and GNU Wget Project Maintainer. http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFH+EZz7M8hyUobTrERAjDxAJ9N3AbEVG6NTy735hy6KtjPO7jm8wCdFX+/ gLx9jZcp0ZQqE2bQAU7VdyQ= =u+PC -END PGP SIGNATURE-
Re: About Automated Unit Test for Wget
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hrvoje Niksic wrote: Micah Cowan [EMAIL PROTECTED] writes: Or did you mean to write wget version of socket interface? i.e. to write our version of socket, connect,write,read,close,bind, listen,accept,,,? sorry I'm confused. Yes! That's what I meant. (Except, we don't need listen, accept; and we only need bind to support --bind-address. We're a client, not a server. ;) ) It would be enough to write function-pointers for (say), wg_socket, wg_connect, wg_sock_write, wg_sock_read, etc, etc, and point them at system socket, connect, etc for real Wget, but at wg_test_socket, wg_test_connect, etc for our emulated servers. This seems like a neat idea, but it should be carefully weighed against the drawbacks. Adding an ad-hoc abstraction layer is harder than it sounds, and has more repercussions than is immediately obvious. An underspecified, unfinished abstraction layer over sockets makes the code harder, not easier, to follow and reason about. You no longer deal with BSD sockets, you deal with an abstraction over them. Is it okay to call getsockname on such a socket? How about setsockopt? What about the listen/bind mechanism (which we do need, as Daniel points out)? I'm having some trouble seeing how most of those present problems. Obviously, you wouldn't call _any_ system functions on these, so yeah, no setsockopt() unless it's a wg_setsockopt() (a wg_setsockopt would probably be a poor way to handle it anyway, as it'd be mainly true-TCP specific). I don't see what you see wrt making the code harder to follow and reason about (true abstraction rarely does, AFAICT, though there are some counter-examples, usually of things that are much, much more abstract than we are used to thinking about). Did you have some specific concerns? I _am_ thinking that it'd probably be best to forgo the idea of one-to-one correspondence of Berkeley sockets, and pass around a struct net_connector * (and struct net_listener *), so we're not forced to deal with file descriptor silliness (where obviously we'd have wanted to avoid the values 0 through 2, and I was even thinking it might _possibly_ be worthwhile to allocate real file descriptors to get the numbers, just to avoid clashes). Then we can focus on actual abstraction (which we don't obtain by emulating Berkeley sockets), rather than just emulation. While Daniel was of course right that we'd need listen, accept, etc, we _wouldn't_ need them to begin using this layer to test against http.c. We wouldn't even need bind, if we didn't include --bind-address in our first tests of the http code. This would mean we'd need to separate uses of read() and write() on normal files (which should continue to use the real calls, until we replace them with the file I/O abstractions), from uses of read(), write(), etc on sockets, which would be using our emulated versions. Unless you're willing to spend a lot of time in careful design of these abstractions, I think this is a mistake. Why? - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer, and GNU Wget Project Maintainer. http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFH+Eoi7M8hyUobTrERAj3VAJ4vb/SPNkNo+Xyd2Hq09U4ey6zJJwCfVmG0 NSVpzr7IEdpUQkTwy/j2z9E= =9lKJ -END PGP SIGNATURE-