bug in wget
Hello, enterring following command results in an error: --- command start --- c:\Downloads\wget_v1.11.3bwget ftp://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/latest-mozilla1.8-l10n/; -P c:\Downloads\ --- command end --- wget cant convert .listing-file into a html-file regards _ Keine Mail mehr verpassen! Jetzt gibt’s Hotmail fürs Handy! http://www.gowindowslive.com/minisites/mail/mobilemail.aspx?Locale=de-de
Re: bug in wget
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Sir Vision wrote: Hello, enterring following command results in an error: --- command start --- c:\Downloads\wget_v1.11.3bwget ftp://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/latest-mozilla1.8-l10n/; -P c:\Downloads\ --- command end --- wget cant convert .listing-file into a html-file As this seems to work fine on Unix, for me, I'll have to leave it to the Windows porting guy (hi Chris!) to find out what might be going wrong. ...however, it would really help if you would supply the full output you got, from wget, that leads you to believe Wget couldn't do this conversion. in fact, it wouldn't hurt to supply the -d flag as well, for maximum debugging messages. - -- Cheers, Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer, and GNU Wget Project Maintainer. http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIVKXx7M8hyUobTrERAo40AKCAmwgDOGgjU2kcTYeEGC3+RkCjzQCeJt6B dz38DW8jMMZtUxc+FhvIhfI= =T+mK -END PGP SIGNATURE-
Re: bug on wget
Micah Cowan [EMAIL PROTECTED] writes: The new Wget flags empty Set-Cookie as a syntax error (but only displays it in -d mode; possibly a bug). I'm not clear on exactly what's possibly a bug: do you mean the fact that Wget only calls attention to it in -d mode? That's what I meant. I probably agree with that behavior... most people probably aren't interested in being informed that a server breaks RFC 2616 mildly; Generally, if Wget considers a header to be in error (and hence ignores it), the user probably needs to know about that. After all, it could be the symptom of a Wget bug, or of an unimplemented extension the server generates. In both cases I as a user would want to know. Of course, Wget should continue to be lenient towards syntax violations widely recognized by popular browsers. Note that I'm not arguing that Wget should warn in this particular case. It is perfectly fine to not consider an empty `Set-Cookie' to be a syntax error and to simply ignore it (and maybe only print a warning in debug mode).
Re: bug on wget
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Hrvoje Niksic wrote: Generally, if Wget considers a header to be in error (and hence ignores it), the user probably needs to know about that. After all, it could be the symptom of a Wget bug, or of an unimplemented extension the server generates. In both cases I as a user would want to know. Of course, Wget should continue to be lenient towards syntax violations widely recognized by popular browsers. Note that I'm not arguing that Wget should warn in this particular case. It is perfectly fine to not consider an empty `Set-Cookie' to be a syntax error and to simply ignore it (and maybe only print a warning in debug mode). That was my thought. I agree with both of your points above: if Wget's not handling something properly, I want to know about it; but at the same time, silently ignoring (erroneous) empty headers doesn't seem like a problem. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHRGqx7M8hyUobTrERCPwQAJ4wGFwPBqyoVDXjrOifNB/fVF1vtACbBnDU fnSx/Vj+S+DVnfRUbIz5HKU= =n4yr -END PGP SIGNATURE-
bug on wget
Hi, I got a bug on wget when executing: wget -a log -x -O search/search-1.html --verbose --wait 3 --limit-rate=20K --tries=3 http://www.nepremicnine.net/nepremicninske_agencije.html?id_regije=1 Segmentation fault (core dumped) I created directory search. The above creates a file search/search-1.html zero-sized. Logfile log: Resolviendo www.nepremicnine.net... 212.103.144.204 Conectando a www.nepremicnine.net|212.103.144.204|:80... conectado. Petición HTTP enviada, esperando respuesta... 200 OK --18:18:28-- http://www.nepremicnine.net/nepremicninske_agencije.html?id_regije=1 = `search/search-1.html' (I hope you understand the Spanish above. If not, labels are the usual: resolving, connecting, HTTP petition sent, waiting for request) It happens the same when varying the parameter on the url id_regije, just in case it helps. I'm using Intel CoreDuo E6300, plenty of disk/mem space. ubuntu 7.10 Should you need any further information don't hesitate to contact. Regards Diego
Re: bug on wget
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Diego Campo wrote: Hi, I got a bug on wget when executing: wget -a log -x -O search/search-1.html --verbose --wait 3 --limit-rate=20K --tries=3 http://www.nepremicnine.net/nepremicninske_agencije.html?id_regije=1 Segmentation fault (core dumped) Hi Diego, I was able to reproduce the problem above in the release version of Wget; however, it appears to be working fine in the current development version of Wget, which is expected to release soon as version 1.11.* * Unfortunately, it has been expected to release soon for a few months now; we got hung up with some legal/licensing issues that are yet to be resolved. It will almost certainly be released in the next few weeks, though. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHQypR7M8hyUobTrERCF99AJ4w790h4juXzPwO+csBbSY3KcLOXACdGYgO Kf4Oawgfjx6WOEzYwkQ47mw= =8gL2 -END PGP SIGNATURE-
Re: bug on wget
Micah Cowan [EMAIL PROTECTED] writes: I was able to reproduce the problem above in the release version of Wget; however, it appears to be working fine in the current development version of Wget, which is expected to release soon as version 1.11.* I think the old Wget crashed on empty Set-Cookie headers. That got fixed when I converted the Set-Cookie parser to use extract_param. The new Wget flags empty Set-Cookie as a syntax error (but only displays it in -d mode; possibly a bug).
Re: bug on wget
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Hrvoje Niksic wrote: Micah Cowan [EMAIL PROTECTED] writes: I was able to reproduce the problem above in the release version of Wget; however, it appears to be working fine in the current development version of Wget, which is expected to release soon as version 1.11.* I think the old Wget crashed on empty Set-Cookie headers. That got fixed when I converted the Set-Cookie parser to use extract_param. The new Wget flags empty Set-Cookie as a syntax error (but only displays it in -d mode; possibly a bug). I'm not clear on exactly what's possibly a bug: do you mean the fact that Wget only calls attention to it in -d mode? I probably agree with that behavior... most people probably aren't interested in being informed that a server breaks RFC 2616 mildly; especially if it's not apt to affect the results. Unless of course the user was expecting that the user send a real cookie, but I'm guessing that this only happens when the server doesn't have one to send (or something). But a user in that situation should be using -d (or at least - -S) to find out what the server is sending. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHQ3N97M8hyUobTrERCCpFAJ9RHcdJ8X4UWpEQIhz+khDWc8MOJwCfZANU vr2lCTLP04R/PP/cBf7sIpE= =6csr -END PGP SIGNATURE-
Re: [bug #20323] Wget issues HEAD before GET, even when the file doesn't exist locally.
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Mauro Tortonesi wrote: Micah Cowan ha scritto: Update of bug #20323 (project wget): Status: Ready For Test = In Progress ___ Follow-up Comment #3: Moving back to In Progress until some questions about the logic are answered: http://addictivecode.org/pipermail/wget-notify/2007-July/75.html http://addictivecode.org/pipermail/wget-notify/2007-July/77.html thanks micah. i have partly misunderstood the logic behind preliminary HEAD request. in my code, HEAD is skipped if -O or --no-content-disposition are given, but if -N is given HEAD is always sent. this is wrong, as HEAD should be skipped even if -N and --no-content-disposition are given (no need to care about the deprecated -N -O combination). can't think of any other case in which HEAD should be skipped, though. Cc'ing wget ML, as it's probably important to open up discussion of the current logic. What about the case when nothing is given on the command line except - --no-content-disposition? What do we need HEAD for then? Also: I don't believe HEAD should be sent if no options are given on the command line. What purpose would that serve? If it's to find a possible Content-Disposition header, we can get that (and more reliably) at GET time (though, I believe we may currently be requiring the file name before we fetch, which if true, should definitely be changed but not for 1.11, in which case the HEAD will be allowed for the time being); and since we're not matching against potential accept/reject lists, we don't really need it. I think it really makes much more sense to enumerate those few cases where we need to issue a HEAD, rather than try to determine all the cases where we don't: if I have to choose a side to err on, I'd rather not send HEAD in a case or two where we needed it, rather than send it in a few where we didn't, as any request-response cycle eats up time. I also believe that the cases where we want a HEAD are/should be fewer than the cases where we don't want them. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGlol+7M8hyUobTrERCOT0AJwNt2dm/80zL7UYbadBaiaPrMvSUQCePKmS WO77ltxl0vr0Pcgd8H1bIY8= =zCTU -END PGP SIGNATURE-
[Fwd: Bug#281201: wget prints it's progress even when background]
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 The following bug was submitted to Debian's bug tracker. I'm curious what people think about this suggestion. Don't we already check for something like redirected output (and force the progress indicator to dots)? It seems to me that if that is appropriate, then a case could be made for this as well. Perhaps instead of shutting up, though, wget should attempt to direct to a file? Perhaps with a one last message to the terminal (assuming the terminal doesn't have TOSTOP set--it should ignore SIGTTOU and handle EIO to handle that case), to indicate that it's doing this. - -Micah - Original Message Subject: Bug#281201: wget prints it's progress even when background Resent-Date: Tue, 10 Jul 2007 13:57:01 +, Tue, 10 Jul 2007 13:57:02 + Resent-From: Ilya Anfimov [EMAIL PROTECTED] Resent-To: [EMAIL PROTECTED] Resent-CC: Noèl Köthe [EMAIL PROTECTED] Date: Tue, 10 Jul 2007 17:54:51 +0400 From: Ilya Anfimov [EMAIL PROTECTED] Reply-To: Ilya Anfimov [EMAIL PROTECTED], [EMAIL PROTECTED] To: Peter Eisentraut [EMAIL PROTECTED] CC: [EMAIL PROTECTED] My suggestion is to stop printing verbose progress messages when the job is resumed in background. It could be checked by (successful) getpgrp() not equal to (successful) tcgetprp(1) in SIGCONT signal handler. And something like this is used in some console applications, for example, in lftp. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGlThP7M8hyUobTrERCA4sAJ0RwfVIsL5UcafLkfm5qihERnRNvQCeIABc t+Y3FeNYctJsdPcPbTwYukk= =eBSi -END PGP SIGNATURE-
possible bug in wget-1.10.2 and earlier
Hi, wget appears to be confused by FTP servers that only have one space between the file-size information. We only came across this problem today so I don't know how common it is. pjjH From: Harrington, Paul Sent: Thursday, May 31, 2007 12:06 AM To: recipient-removed Subject: RE: File issue using WGET Your FTP server must have changed the output of the listing format or, more precisely, the string representation of some of the components has changed such that only one space separates the group name from the file-size. The bug is, of course, with wget but it is one that hitherto had not been observed when interacting with your FTP server. pjjH [EMAIL PROTECTED] diff -u ftp-ls.c ~/tmp --- ftp-ls.c2005-08-04 17:52:33.0 -0400 +++ /u/harringp/tmp/ftp-ls.c2007-05-31 00:02:07.209955000 -0400 @@ -229,6 +229,18 @@ break; } errno = 0; + /* after the while loop terminates, t may not always + point to a space character. In the case when + there is only one-space between the user/group + information and the file-size, the space will + have been overwritten by a \0 via strok(). So, + if you have been through the loop at least once, + advance forward one chacter. + */ + + if (t ptok) + t++; + size = str_to_wgint (t, NULL, 10); if (size == WGINT_MAX errno == ERANGE) /* Out of range -- ignore the size. Should
Bug-report: wget with multiple cnames in ssl certificate
Hi If i connect with wget 1.10.2 (Debian Etch Ubuntu Feisty Fawn) to a secure host, that uses multiple cnames in the certificate i get the following error: [EMAIL PROTECTED]:~$ wget https://host.domain.tld --10:18:55-- https://host.domain.tld/ = `index.html' Resolving host.domain.tld... xxx.xxx.xxx.xxx Connecting to host.domain.tld|xxx.xxx.xxx.xxx|:443... connected. ERROR: certificate common name `host0.domain.tld' doesn't match requested host name `host.domain.tld'. To connect to host.domain.tld insecurely, use `--no-check-certificate'. Unable to establish SSL connection. If I do the same with wget 1.9.1 (Debian Sarge) I do not get that Error. Kind regards, Alex Antener -- Alex Antener Dipl. Medienkuenstler FH [EMAIL PROTECTED] // http://lix.cc // +41 (0)44 586 97 63 GPG Key: 1024D/14D3C7A1 https://lix.cc/gpg_key.php Fingerprint: BAB6 E61B 17D7 A9C9 6313 5141 3A3C DAA3 14D3 C7A1
Re: wget 1.11 alpha1 [Fwd: Bug#378691: wget --continue doesn't workwith HTTP]
Jochen Roderburg wrote: I have now tested the new wget 1.11 beta1 on my Linux system and the above issue is solved now. The Remote file is newer message now only appears when the local file exists and most of the other logic with time-stamping and file-naming works like expected. excellent. I meanwhile found, however, another new problem with time-stamping, which mainly occurs in connection with a proxy-cache, I will report that in a new thread. Same for a small problem with the SSL configuration. thank you very much for the useful bug reports you keep sending us ;-) -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: wget 1.11 alpha1 [Fwd: Bug#378691: wget --continue doesn't workwith HTTP]
Jochen Roderburg ha scritto: Zitat von Jochen Roderburg [EMAIL PROTECTED]: Zitat von Hrvoje Niksic [EMAIL PROTECTED]: Mauro, you will need to look at this one. Part of the problem is that Wget decides to save to index.html.1 although -c is in use. That is solved with the patch attached below. But the other part is that hstat.local_file is a NULL pointer when stat(hstat.local_file, st) is used to determine whether the file already exists in the -c case. That seems to be a result of your changes to the code -- previously, hstat.local_file would get initialied in http_loop. This looks as if if could also be the cause for the problems which I reported some weeks ago for the timestamping mode (http://www.mail-archive.com/wget@sunsite.dk/msg09083.html) Hello Mauro, The timestamping issues I reported in above mentioned message are now also repaired by the patch you mailed last week here. Only the small *cosmetic* issue remains that it *always* says: Remote file is newer, retrieving. even if there is no local file yet. hi jochen, i have been working on the problem you reported for the last couple of days. i've just committed a patch that should fix it for good. could you please try the new HTTP code and tell me if it works properly? thank you very much for your help. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: wget 1.11 alpha1 [Fwd: Bug#378691: wget --continue doesn't workwith HTTP]
Zitat von Jochen Roderburg [EMAIL PROTECTED]: Zitat von Hrvoje Niksic [EMAIL PROTECTED]: Mauro, you will need to look at this one. Part of the problem is that Wget decides to save to index.html.1 although -c is in use. That is solved with the patch attached below. But the other part is that hstat.local_file is a NULL pointer when stat(hstat.local_file, st) is used to determine whether the file already exists in the -c case. That seems to be a result of your changes to the code -- previously, hstat.local_file would get initialied in http_loop. This looks as if if could also be the cause for the problems which I reported some weeks ago for the timestamping mode (http://www.mail-archive.com/wget@sunsite.dk/msg09083.html) Hello Mauro, The timestamping issues I reported in above mentioned message are now also repaired by the patch you mailed last week here. Only the small *cosmetic* issue remains that it *always* says: Remote file is newer, retrieving. even if there is no local file yet. J.Roderburg
Re: wget 1.11 alpha1 [Fwd: Bug#378691: wget --continue doesn't workwith HTTP]
Hrvoje Niksic ha scritto: Noèl Köthe [EMAIL PROTECTED] writes: a wget -c problem report with the 1.11 alpha 1 version (http://bugs.debian.org/378691): I can reproduce the problem. If I have already 1 MB downloaded wget -c doesn't continue. Instead it starts to download again: Mauro, you will need to look at this one. Part of the problem is that Wget decides to save to index.html.1 although -c is in use. That is solved with the patch attached below. But the other part is that hstat.local_file is a NULL pointer when stat(hstat.local_file, st) is used to determine whether the file already exists in the -c case. That seems to be a result of your changes to the code -- previously, hstat.local_file would get initialied in http_loop. The partial patch follows: Index: src/http.c === --- src/http.c (revision 2178) +++ src/http.c (working copy) @@ -1762,7 +1762,7 @@ return RETROK; } - else + else if (!ALLOW_CLOBBER) { char *unique = unique_name (hs-local_file, true); if (unique != hs-local_file) you're right, of course. the patch included in attachment should fix the problem. since the new HTTP code supports Content-Disposition and delays the decision of the destination filename until it receives the response header, the best solution i could find to make -c work is to send a HEAD request to determine the actual destination filename before resuming download if -c is given. please, let me know what you think. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it Index: http.c === --- http.c (revisione 2178) +++ http.c (copia locale) @@ -1762,7 +1762,7 @@ return RETROK; } - else + else if (!ALLOW_CLOBBER) { char *unique = unique_name (hs-local_file, true); if (unique != hs-local_file) @@ -2231,6 +2231,7 @@ { int count; bool got_head = false; /* used for time-stamping */ + bool got_name = false; char *tms; const char *tmrate; uerr_t err, ret = TRYLIMEXC; @@ -2264,7 +2265,10 @@ hstat.referer = referer; if (opt.output_document) +{ hstat.local_file = xstrdup (opt.output_document); + got_name = true; +} /* Reset the counter. */ count = 0; @@ -2309,13 +2313,16 @@ /* Default document type is empty. However, if spider mode is on or time-stamping is employed, HEAD_ONLY commands is encoded within *dt. */ - if ((opt.spider !opt.recursive) || (opt.timestamping !got_head)) + if ((opt.spider !opt.recursive) + || (opt.timestamping !got_head) + || (opt.always_rest !got_name)) *dt |= HEAD_ONLY; else *dt = ~HEAD_ONLY; /* Decide whether or not to restart. */ if (opt.always_rest + got_name stat (hstat.local_file, st) == 0 S_ISREG (st.st_mode)) /* When -c is used, continue from on-disk size. (Can't use @@ -2484,6 +2491,12 @@ continue; } + if (opt.always_rest !got_name) +{ + got_name = true; + continue; +} + if ((tmr != (time_t) (-1)) (!opt.spider || opt.recursive) ((hstat.len == hstat.contlen) || Index: ChangeLog === --- ChangeLog (revisione 2178) +++ ChangeLog (copia locale) @@ -1,3 +1,9 @@ +2006-08-16 Mauro Tortonesi [EMAIL PROTECTED] + + * http.c: Fixed bug which broke --continue feature. Now if -c is + given, http_loop sends a HEAD request to find out the destination + filename before resuming download. + 2006-08-08 Hrvoje Niksic [EMAIL PROTECTED] * utils.c (datetime_str): Avoid code repetition with time_str.
Re: wget 1.11 alpha1 [Fwd: Bug#378691: wget --continue doesn't workwith HTTP]
Mauro Tortonesi [EMAIL PROTECTED] writes: you're right, of course. the patch included in attachment should fix the problem. since the new HTTP code supports Content-Disposition and delays the decision of the destination filename until it receives the response header, the best solution i could find to make -c work is to send a HEAD request to determine the actual destination filename before resuming download if -c is given. please, let me know what you think. I don't like the additional HEAD request, but I can't think of a better solution.
Re: wget 1.11 alpha1 [Fwd: Bug#378691: wget --continue doesn't workwith HTTP]
Hrvoje Niksic ha scritto: Mauro Tortonesi [EMAIL PROTECTED] writes: you're right, of course. the patch included in attachment should fix the problem. since the new HTTP code supports Content-Disposition and delays the decision of the destination filename until it receives the response header, the best solution i could find to make -c work is to send a HEAD request to determine the actual destination filename before resuming download if -c is given. please, let me know what you think. I don't like the additional HEAD request, but I can't think of a better solution. same for me. in order to avoid the overhead of the extra HEAD request, i had considered disabling Content-Disposition and using url_file_name to determine the destination filename in case -c is given. but i really didn't like that solution. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: wget 1.11 alpha1 [Fwd: Bug#378691: wget --continue doesn't workwith HTTP]
Hrvoje Niksic wrote: Noèl Köthe [EMAIL PROTECTED] writes: a wget -c problem report with the 1.11 alpha 1 version (http://bugs.debian.org/378691): I can reproduce the problem. If I have already 1 MB downloaded wget -c doesn't continue. Instead it starts to download again: Mauro, you will need to look at this one. i surely will. unfortunately, at the moment i am attending the winsys 2006 research conference: http://www.winsys.org i'll take a look at the problem as soon as i get back to italy. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
wget 1.11 alpha1 [Fwd: Bug#378691: wget --continue doesn't work with HTTP]
Hello, a wget -c problem report with the 1.11 alpha 1 version (http://bugs.debian.org/378691): I can reproduce the problem. If I have already 1 MB downloaded wget -c doesn't continue. Instead it starts to download again: Weitergeleitete Nachricht [EMAIL PROTECTED]:~$ strace -o wget-strace wget -c http://ftp.iasi.roedu.net/100MB --14:28:07-- http://ftp.iasi.roedu.net/100MB Resolving ftp.iasi.roedu.net... 192.129.4.120 Connecting to ftp.iasi.roedu.net|192.129.4.120|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 104857600 (100M) [text/plain] Saving to: dMB.8' The HTTP conversation: GET /100MB HTTP/1.0 User-Agent: Wget/1.11-alpha-1 Accept: */* Host: ftp.iasi.roedu.net Connection: Keep-Alive HTTP/1.1 200 OK Date: Tue, 18 Jul 2006 11:24:14 GMT Server: Apache/2.2.2 (Unix) Last-Modified: Sat, 03 Dec 2005 09:14:42 GMT ETag: a002e4cb-640-1dbb0480 Accept-Ranges: bytes Content-Length: 104857600 Keep-Alive: timeout=5, max=100 Connection: Keep-Alive Content-Type: text/plain With an older version of wget, same file, same server, it works. This version works with FTP. A strace (attached) shows that it doesn't even try to see if 100MB exists before sending the HTTP request. -- System Information: Debian Release: testing/unstable APT prefers experimental APT policy: (500, 'experimental'), (500, 'unstable'), (500, 'testing') Architecture: i386 (i686) Shell: /bin/sh linked to /bin/bash Kernel: Linux 2.6.17-1-686 Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8) Versions of packages wget depends on: ii libc62.3.999.2-8 GNU C Library: Shared libraries ii libssl0.9.8 0.9.8b-2SSL shared libraries -- Noèl Köthe noel debian.org Debian GNU/Linux, www.debian.org signature.asc Description: Dies ist ein digital signierter Nachrichtenteil
Re: wget 1.11 alpha1 [Fwd: Bug#378691: wget --continue doesn't workwith HTTP]
Zitat von Hrvoje Niksic [EMAIL PROTECTED]: Mauro, you will need to look at this one. Part of the problem is that Wget decides to save to index.html.1 although -c is in use. That is solved with the patch attached below. But the other part is that hstat.local_file is a NULL pointer when stat(hstat.local_file, st) is used to determine whether the file already exists in the -c case. That seems to be a result of your changes to the code -- previously, hstat.local_file would get initialied in http_loop. This looks as if if could also be the cause for the problems which I reported some weeks ago for the timestamping mode (http://www.mail-archive.com/wget@sunsite.dk/msg09083.html) J.Roderburg
Re: Bug in wget 1.10.2 makefile
Daniel Richard G. ha scritto: Hello, The MAKEDEFS value in the top-level Makefile.in also needs to include DESTDIR='$(DESTDIR)'. fixed, thanks. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Bug in wget 1.10.2 makefile
Hello, The MAKEDEFS value in the top-level Makefile.in also needs to include DESTDIR='$(DESTDIR)'. (build log excerpt) + make install DESTDIR=/tmp/wget--1.10.2.build/__dest__ cd src make CC='cc' CPPFLAGS='-D__EXTENSIONS__ -D_REENTRANT -Dsparc' ... install.bin /tg/freeport/src/wget/wget--1.10.2/mkinstalldirs /tg/freeport/arch/sunos64/bin /tg/freeport/src/wget/wget--1.10.2/install-sh -c wget /tg/freeport/arch/sunos64/bin/wget cp: cannot create /tg/freeport/arch/sunos64/bin/_inst.8_: Read-only file system *** Error code 1 make: Fatal error: Command failed for target `install.bin' Current working directory /tmp/wget--1.10.2.build/src *** Error code 1 make: Fatal error: Command failed for target `install.bin' (end) --Daniel -- NAME = Daniel Richard G. ## Remember, skunks _\|/_ meef? EMAIL1 = [EMAIL PROTECTED]## don't smell bad---(/o|o\) / EMAIL2 = [EMAIL PROTECTED] ## it's the people who(^), WWW= http://www.**.org/ ## annoy them that do!/ \ -- (** = site not yet online)
A bug in wget 1.10.2
Hello, i'm using wget 1.10.2 in Windows, the windows binary version, and it have a bug when downloading with -c and with a input file. If the first file of the list is the one to be continued, wget do it fine, if not, wgettry to download the files from the beginning, and it says that is downloading the files, but do not replace the existing ones. I'm using instead -nc but is not what i want, cause, with it, wget skip existing files, even if not fully downloaded. Sorry for my english. Hope you have understood what i was trying to say. Keep up the good work.
Re: [Fwd: Bug#366434: wget: Multiple 'Pragma:' headers not supported]
Noèl Köthe wrote: Hello, a forwarded report from http://bugs.debian.org/366434 could this behaviour be added to the doc/manpage? i wonder if it makes sense to add generic support for multiple headers in wget, for instance by extending the --header option like this: wget --header=Pragma: xxx --header=dontoverride,Pragma: xxx2 someurl as an alternative, we could choose to support multiple headers only for a few header types, like Pragma. however, i don't really like this second choise, as it would require to hardcode the above mentioned header names in the wget sources, which IMVHO is a *VERY* bad practice. what do you think? -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: [Fwd: Bug#366434: wget: Multiple 'Pragma:' headers not supported]
Mauro Tortonesi [EMAIL PROTECTED] writes: Noèl Köthe wrote: Hello, a forwarded report from http://bugs.debian.org/366434 could this behaviour be added to the doc/manpage? i wonder if it makes sense to add generic support for multiple headers in wget, for instance by extending the --header option like this: Or by adding a `--append-header' with that functionality. Originally --header always appended, but the problem was that people sometimes wanted to change the headers issued by Wget. The reason I didn't introduce (in fact keep) append was that HTTP pretty much disallows duplicate headers. According to HTTP, a duplicate header field is equivalent to a single header header with multiple values joined using the , separator -- which the bug report mentions.
RE: [Fwd: Bug#366434: wget: Multiple 'Pragma:' headers not suppor ted]
From: Mauro Tortonesi [mailto:[EMAIL PROTECTED] i wonder if it makes sense to add generic support for multiple headers in wget, for instance by extending the --header option like this: wget --header=Pragma: xxx --header=dontoverride,Pragma: xxx2 someurl That could be a problem if you need to send a really weird custom header named dontoverride,Pragma. Probability is near nil but with the whole big bad internet waiting maybe separating switches (--header and --header-add) would be better. as an alternative, we could choose to support multiple headers only for a few header types, like Pragma. however, i don't really like this second choise, as it would require to hardcode the above mentioned header names in the wget sources, which IMVHO is a *VERY* bad practice. Same opinion, hard coding the header list would be ugly and will byte some user in the nose some time in the future: if you need to add several XXXY headers either patch and recompile or use at least versione x.y Heiko -- -- PREVINET S.p.A. www.previnet.it -- Heiko Herold [EMAIL PROTECTED] [EMAIL PROTECTED] -- +39-041-5907073 / +39-041-5917073 ph -- +39-041-5907472 / +39-041-5917472 fax
Re: [Fwd: Bug#366434: wget: Multiple 'Pragma:' headers not suppor ted]
Herold Heiko wrote: From: Mauro Tortonesi [mailto:[EMAIL PROTECTED] i wonder if it makes sense to add generic support for multiple headers in wget, for instance by extending the --header option like this: wget --header=Pragma: xxx --header=dontoverride,Pragma: xxx2 someurl That could be a problem if you need to send a really weird custom header named dontoverride,Pragma. Probability is near nil but with the whole big bad internet waiting maybe separating switches (--header and --header-add) would be better. you're right. in fact, i like hrvoje's --append-header proposal better. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
[Fwd: Bug#366434: wget: Multiple 'Pragma:' headers not supported]
Hello, a forwarded report from http://bugs.debian.org/366434 could this behaviour be added to the doc/manpage? thx. Package: wget Version: 1.10.2-1 It's meaningful to have multiple 'Pragma:' headers within an http request, but wget will silently issue only a single one of them if they are specified within separate arguments. For example, [EMAIL PROTECTED] /tmp]$ wget -U 'NSPlayer/4.1.0.3856' --header='Pragma: no-cache,rate=1.00,stream-time=0,stream-offset=0:0,request-context=2,max-duration=0' --header='Pragma: xClientGUID={c77e7400-738a-11d2-9add-0020af0a3278}' --header='Pragma: xPlayStrm=1' --header='Pragma: stream-switch-count=1' --header='Pragma: stream-switch-entry=:1:0' http://wms.scripps.com:80/knoxville/siler/siler.mp3 ... doesn't work, and inspection with ethereal reveals that wget is only sending the last 'Pragma:' header given. Compressing all the 'Pragma' directives into a single header makes the fetch work: [EMAIL PROTECTED] /tmp]$ wget -U 'NSPlayer/4.1.0.3856' --header='Pragma: no-cache,rate=1.00,stream-time=0,stream-offset=0:0,request-context=2,max-duration=0,xClientGUID={c77e7400-738a-11d2-9add-0020af0a3278},xPlayStrm=1,stream-switch-count=1,stream-switch-entry=:1:0' http://wms.scripps.com:80/knoxville/siler/siler.mp3 -- Noèl Köthe noel debian.org Debian GNU/Linux, www.debian.org signature.asc Description: Dies ist ein digital signierter Nachrichtenteil
bug in wget windows
done. == PORT ... done.== RETR SUSE-10.0-EvalDVD-i386-GM.iso ... done. [ = ] -673,009,664 113,23K/s Assertion failed: bytes = 0, file retr.c, line 292 This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information. smime.p7s Description: S/MIME Cryptographic Signature
Re: bug in wget windows
Tobias Koeck wrote: done. == PORT ... done.== RETR SUSE-10.0-EvalDVD-i386-GM.iso ... done. [ = ] -673,009,664 113,23K/s Assertion failed: bytes = 0, file retr.c, line 292 This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information. you are probably using an older version of wget, without long file support. please upgrade to wget 1.10.2. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
a bug about wget
That is, there is HTML like this: pClick the following to go to the a href=http://www.something.com/junk.asp?thepageIwant=2;;next page/a./p What I need is for wget to understand that stuff following an ? in a URL indicates that it's a distinctly different page, and it should go recursively retrieve it. The --recursive option and -A option doesn't seem to help. I had tried: wget -r -l2 -A junk.asp?the* and wget -r -l2 -A junk.asp%3Fthe* no command can download the file? Any help you can give me is appreciated. thanks
Re: openssl server renogiation bug in wget
Thanks for the report; I've applied this patch: 2005-08-26 Jeremy Shapiro [EMAIL PROTECTED] * openssl.c (ssl_init): Set SSL_MODE_AUTO_RETRY. Index: openssl.c === --- openssl.c (revision 2063) +++ openssl.c (working copy) @@ -225,6 +225,10 @@ handles them correctly), allow them in OpenSSL. */ SSL_CTX_set_mode (ssl_ctx, SSL_MODE_ENABLE_PARTIAL_WRITE); + /* The OpenSSL library can handle renegotiations automatically, so + tell it to do so. */ + SSL_CTX_set_mode (ssl_ctx, SSL_MODE_AUTO_RETRY); + return true; error:
openssl server renogiation bug in wget
I believe I've encountered a bug in wget. When using https, if the server does a renegotiation handshake wget fails trying to peek for the application data. This occurs because wget does not set the openssl context mode SSL_MODE_AUTO_RETRY. When I added the line: SSL_CTX_set_mode (ssl_ctx, SSL_MODE_AUTO_RETRY); just after the line that sets PARTIAL_WRITE mode in ssl_init() in openssl.c everything worked again. To reproduce, set up an apache server that only does client authentication for a protected directory. When wget does the ssl connect it negotiates the handshake. However, when it sends the request for the restricted directory the server will try to renegotiate with a client authenticated handshake. Wget will fail trying to read the application data, and continually retry. Jeremy
[Fwd: Bug#319088: wget: don't rely on exactly one blank char between size and month]
Hello, giuseppe wrote a patch for 1.10.1.beta1. Full report can be viewed here: http://bugs.debian.org/319088 Weitergeleitete Nachricht Von: giuseppe bonacci [EMAIL PROTECTED] Antwort an: giuseppe bonacci [EMAIL PROTECTED], [EMAIL PROTECTED] An: Debian Bug Tracking System [EMAIL PROTECTED] Betreff: Bug#319088: wget: don't rely on exactly one blank char between size and month Datum: Wed, 20 Jul 2005 10:26:20 +0200 Package: wget Version: 1.10-3+1.10.1beta1 Followup-For: Bug #319088 A better patch is the following, that drops the assumption that there is exactly one blank char between size and month (implicit in the statement char *t = tok - 2;). As far as I know, strtok() modifies the string 1234 aaa bbb@ (where @ stands for \0, for clarity) so that when tok points to aaa the string looks like 1234@ aaa@ bbb@, and (tok - 2) points to , which is not useful for backtracking. I think the best way to access the previous token is ... keeping a pointer to it. g.b. --- wget-1.10/src/ftp-ls.c.orig 2005-05-12 18:24:33.0 +0200 +++ wget-1.10/src/ftp-ls.c2005-07-20 09:53:30.206791032 +0200 @@ -110,7 +110,7 @@ struct tm timestruct, *tnow; time_t timenow; - char *line, *tok; /* tokenizer */ + char *line, *tok, *ptok; /* tokenizer */ struct fileinfo *dir, *l, cur; /* list creation */ fp = fopen (file, rb); @@ -201,7 +201,9 @@ This tactic is quite dubious when it comes to internationalization issues (non-English month names), but it works for now. */ - while ((tok = strtok (NULL, )) != NULL) + ptok = line; + while (ptok = tok, + (tok = strtok(NULL, )) != NULL) { --next; if (next 0) /* a month name was not encountered */ @@ -217,9 +219,7 @@ /* Back up to the beginning of the previous token and parse it with str_to_wgint. */ - char *t = tok - 2; - while (t line ISDIGIT (*t)) - --t; + char *t = ptok; if (t == line) { /* Something has gone wrong during parsing. */ -- System Information: Debian Release: testing/unstable APT prefers testing APT policy: (500, 'testing') Architecture: i386 (i686) Shell: /bin/sh linked to /bin/bash Kernel: Linux 2.6.8-2-686-smp Locale: LANG=C, LC_CTYPE=C (charmap=ANSI_X3.4-1968) Versions of packages wget depends on: ii libc6 2.3.2.ds1-22 GNU C Library: Shared libraries an ii libssl0.9.7 0.9.7e-3 SSL shared libraries wget recommends no packages. -- no debconf information -- Noèl Köthe noel debian.org Debian GNU/Linux, www.debian.org signature.asc Description: This is a digitally signed message part
Re: Small bug in Wget manual page
On Wednesday 15 June 2005 04:57 pm, Ulf Harnhammar wrote: On Wed, Jun 15, 2005 at 03:53:40PM -0500, Mauro Tortonesi wrote: the web pages (including the documentation) on gnu.org have just been updated. Nice! I have found some broken links and strange grammar, though: * index.html: There are archives of the main GNU Wget list at ** fly.cc.fer.hr ** www.geocrawler.com (neither works) * wgetdev.html ** Translation Project page (doesn't work) * faq.html ** 3.1 [..] Yes, starting from version 1.10, GNU Wget support files larger than 2GB. (should be supports) fixed. thank you very much. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it Institute for Human Machine Cognition http://www.ihmc.us GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: Small bug in Wget manual page
On Wednesday 15 June 2005 05:14 pm, Ulf Harnhammar wrote: On Wed, Jun 15, 2005 at 11:57:42PM +0200, Ulf Harnhammar wrote: * faq.html ** 3.1 [..] Yes, starting from version 1.10, GNU Wget support files larger than 2GB. (should be supports) ** 2.0 How I compile GNU Wget? (should be How do I) fixed. thank you very much. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it Institute for Human Machine Cognition http://www.ihmc.us GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: Small bug in Wget manual page
Mauro Tortonesi [EMAIL PROTECTED] writes: this seems to be already fixed in the 1.10 documentation. Now that 1.10 is released, we should probably update the on-site documentation.
Re: Small bug in Wget manual page
On Wednesday 15 June 2005 02:05 pm, Hrvoje Niksic wrote: Mauro Tortonesi [EMAIL PROTECTED] writes: this seems to be already fixed in the 1.10 documentation. Now that 1.10 is released, we should probably update the on-site documentation. i am doing it right now. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it Institute for Human Machine Cognition http://www.ihmc.us GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: Small bug in Wget manual page
On Wednesday 15 June 2005 02:16 pm, Mauro Tortonesi wrote: On Wednesday 15 June 2005 02:05 pm, Hrvoje Niksic wrote: Mauro Tortonesi [EMAIL PROTECTED] writes: this seems to be already fixed in the 1.10 documentation. Now that 1.10 is released, we should probably update the on-site documentation. i am doing it right now. the web pages (including the documentation) on gnu.org have just been updated. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it Institute for Human Machine Cognition http://www.ihmc.us GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: Small bug in Wget manual page
On Wed, Jun 15, 2005 at 03:53:40PM -0500, Mauro Tortonesi wrote: the web pages (including the documentation) on gnu.org have just been updated. Nice! I have found some broken links and strange grammar, though: * index.html: There are archives of the main GNU Wget list at ** fly.cc.fer.hr ** www.geocrawler.com (neither works) * wgetdev.html ** Translation Project page (doesn't work) * faq.html ** 3.1 [..] Yes, starting from version 1.10, GNU Wget support files larger than 2GB. (should be supports) // Ulf
Re: Small bug in Wget manual page
On Wed, Jun 15, 2005 at 11:57:42PM +0200, Ulf Harnhammar wrote: * faq.html ** 3.1 [..] Yes, starting from version 1.10, GNU Wget support files larger than 2GB. (should be supports) ** 2.0 How I compile GNU Wget? (should be How do I) // Ulf
Re: Small bug in Wget manual page
On Thursday 02 June 2005 09:33 am, Herb Schilling wrote: Hi, On http://www.gnu.org/software/wget/manual/wget.html, the section on protocol-directories has a paragraph that is a duplicate of the section on no-host-directories. Other than that, the manual is terrific! Wget is wonderful also. I don't know what I would do without it. --protocol-directories Use the protocol name as a directory component of local file names. For example, with this option, wget -r http://host will save to http/host/... rather than just to host/ Disable generation of host-prefixed directories. By default, invoking Wget with -r http://fly.srk.fer.hr/ will create a structure of directories beginning with fly.srk.fer.hr/. This option disables such behavior. this seems to be already fixed in the 1.10 documentation. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it Institute for Human Machine Cognition http://www.ihmc.us GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Small bug in Wget manual page
Title: Small bug in Wget manual page Hi, On http://www.gnu.org/software/wget/manual/wget.html, the section on protocol-directories has a paragraph that is a duplicate of the section on no-host-directories. Other than that, the manual is terrific! Wget is wonderful also. I don't know what I would do without it. --protocol-directories Use the protocol name as a directory component of local file names. For example, with this option, wget -r http://host will save to http/host/... rather than just to host/ Disable generation of host-prefixed directories. By default, invoking Wget with -r http://fly.srk.fer.hr/ will create a structure of directories beginning with fly.srk.fer.hr/. This option disables such behavior. -- Herb Schilling NASA Glenn Research Center Brook Park, OH 44135 [EMAIL PROTECTED] If all our misfortunes were laid in one common heap whence everyone must take an equal portion, most people would be contented to take their own and depart. -Socrates (469?-399 B.C.)
Re: Serious retrieval bug in wget 1.9.1 and newer
Wget doesn't recognize the image tag, Aah, thanks. Should Wget support it to be compatible? IMHO yes. Thanks for your help. Werner
Serious retrieval bug in wget 1.9.1 and newer
[CVS 2005-05-25] I tried this command: wget -r -L -l1 freetype.freedesktop.org/freetype2/screenshots.html directly from the build directory, without using a .wgetrc file. In the file `screenshots.html' there is a reference to the file ../image/ft2-kde-thumb.png (and others) which wget simply doesn't download -- no error message, no warning. My Mozilla browser displays the page just fine. Since wget downloads the first thumbnail picture `../image/ft2-nautilus-thumb.png' without problems I suspect a serious bug in wget. I'm running wget on a GNU/Linux box. BTW, it is not possible for CVS wget to have builddir != srcdir (after creating the configure script), which is bad IMHO. Werner
Re: Serious retrieval bug in wget 1.9.1 and newer
Werner LEMBERG [EMAIL PROTECTED] writes: directly from the build directory, without using a .wgetrc file. In the file `screenshots.html' there is a reference to the file ../image/ft2-kde-thumb.png The reference looks like this: image width=160 height=120 alt=KDE screenshot src=../image/ft2-kde-thumb.png Wget doesn't recognize the image tag, which I've never heard of before. It's not mentioned in HTML 4.01, it seems to be missing from the various documents listing IE and Netscape extensions to HTML. Googling for image tag reveals a number of hits that really refer to IMG. Mozilla and Opera do support it, so there's obviously some history behind the tag. Has anyone heard it before? Should Wget support it to be compatible? (and others) which wget simply doesn't download -- no error message, no warning. My Mozilla browser displays the page just fine. Since wget downloads the first thumbnail picture `../image/ft2-nautilus-thumb.png' without problems I suspect a serious bug in wget. ft2-nautilus-thumb.png is referenced using the regular img tag. BTW, it is not possible for CVS wget to have builddir != srcdir (after creating the configure script), which is bad IMHO. It seems to work here, except for the case when you build Wget in srcdir as well.
Is this a bug in wget ? I need an urgent help!
I try to do something like wget http://website.com/ ... login=usernamedomain=hotmail%2ecom_lang=EN But when wget sends the URL out, the hotmail%2ecom becomes hotmail.com !!! Is this the supposed behaviour ? I saw this on the sniffer. I suppose the translation of %2 to . is done by wget. Because of this, wget cannot retrieve the document. How can I force wget to send out URL as it is without making any translation ??! Yahoo! Mail Stay connected, organized, and protected. Take the tour: http://tour.mail.yahoo.com/mailtour.html
Re: Is this a bug in wget ? I need an urgent help!
Will Kuhn [EMAIL PROTECTED] writes: I try to do something like wget http://website.com/ ... login=usernamedomain=hotmail%2ecom_lang=EN But when wget sends the URL out, the hotmail%2ecom becomes hotmail.com !!! Is this the supposed behaviour ? Yes. I saw this on the sniffer. I suppose the translation of %2 to . is done by wget. Actually, %2e is translated to .. Since 2e is the ASCII hex code corresponding to the . character, the two are entirely equivalent. Are you sure that the download doesn't fail for some other unrelated reason? How can I force wget to send out URL as it is without making any translation ??! Some translation must be done, for example spaces must be converted to %20, and so on. During that course Wget translates regular characters represented by hex codes into regular characters. If you don't like it, you can hack url.c:decide_copy_method to always return CM_PASSTHROUGH upon encountering a %XX sequence.
Re: Is this a bug in wget ? I need an urgent help!
Hrvoje Niksic [EMAIL PROTECTED] writes: Can I have it not do the translation ??! Unfortunately, only by changing the source code as described in the previous mail. BTW I've just changed the CVS code to not decode the % sequences. Wget 1.10 will contain the fix.
[Fwd: Bug#197916: wget: Mutual incompatibility between arguments -k and -O]
Hello, here a bugreport: (http://bugs.debian.org/197916) -Weitergeleitete Nachricht- From: Antoni Bella Perez [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: Bug#197916: wget: Mutual incompatibility between arguments -k and -O Date: Wed, 18 Jun 2003 16:49:22 +0200 Package: wget Version: 1.8.2-10 Severity: important These are the arguments: ## ARGUMENTS `-O FILE' `--output-document=FILE' ## `-k' `--convert-links' ## I have created script following the documentation man and info, and has happened that the scheme of the line that I specified does not work: wget - k URL - Or file.html Low I show the output of the command: ## BUG [16:28:43] [EMAIL PROTECTED]:~$ wget -k http://www.terra.es/personal7/bella5/ -O index.html --16:28:48-- http://www.terra.es/personal7/bella5/ = `index.html' Resolving www.terra.es... done. Connecting to www.terra.es[213.4.130.210]:80... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] [ = ] 11,13726.08K/s 16:29:04 (26.08 KB/s) - `index.html' saved [11137] index.html.1: No such file or directory Converting index.html.1... nothing to do. Converted 1 files in 0.00 seconds. END With this there am lost long time, with which step to consider that; or he is bug or it would have to be documented. Regards Toni -- Nol Kthe noel debian.org Debian GNU/Linux, www.debian.org signature.asc Description: Dies ist ein digital signierter Nachrichtenteil
[Fwd: Bug#182957: wget: manual page doesn't document type of patterns for --rejlist, --acclist]
Hello, maybe someone can document this (http://bugs.debian.org/182957) in one or two sentences in wget.texi. thx. -Weitergeleitete Nachricht- From: Daniel B. dsb smart.net ... The wget manual page doesn't document the format of the comma-separated values for the --rejlist and --acclist options The wget manual page says: -A acclist --accept acclist -R rejlist --reject rejlist Specify comma-separated lists of file name suffixes or patterns to accept or reject. Particular unanswered questsions are: - Whether pattern means shell (globbing) pattern or regular expression. - If it means regular expression: - Which style of regular expression (basic, extended, Perl 5, other). - Whether the expression anchored or not. - Whether suffix means xyz in abc.xyz, .xyz in abc.xyz, or any string found at the end of the candidate string (e.g., yz in abc.xyz or in DJayz). - How a suffix is differentiated from a pattern. -- Nol Kthe noel debian.org Debian GNU/Linux, www.debian.org signature.asc Description: Dies ist ein digital signierter Nachrichtenteil
Re: Bug in wget 1.9.1 documentation
Tristan Miller [EMAIL PROTECTED] writes: There appears to be a bug in the documentation (man page, etc.) for wget 1.9.1. I think this is a bug in the man page generation process.
Bug in wget 1.9.1 documentation
Greetings. There appears to be a bug in the documentation (man page, etc.) for wget 1.9.1. Specifically, the section about the command-line option for proxies ends abruptly: -Y on/off --proxy=on/off Turn proxy support on or off. The proxy is on by default if the appropriate environment variable is defined. For more information about the use of proxies with Wget, -Q quota --quota=quota Specify download quota for automatic retrievals. The value can be specified in bytes (default), kilobytes (with k suffix), or megabytes (with m suffix). Regards, Tristan -- _ _V.-o Tristan Miller [en,(fr,de,ia)]Space is limited / |`-' -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=In a haiku, so it's hard (7_\\http://www.nothingisreal.com/ To finish what you
Re: Bug in wget: cannot request urls with double-slash in the query string
D Richard Felker III [EMAIL PROTECTED] writes: The request log shows that the slashes are apparently respected. I retried a test case and found the same thing -- the slashes were respected. OK. Then I remembered that I was using -i. Wget seems to work fine with the url on the command line; the bug only happens when the url is passed in with: cat EOF | wget -i - http://... EOF But I cannot repeat that, either. As long as the consecutive slashes are in the query string, they're not stripped. Using this method is necessary since it is the ONLY secure way I know of to do a password-protected http request from a shell script. Yes, that is the best way to do it.
Re: Bug in wget: cannot request urls with double-slash in the query string
On Mon, Mar 01, 2004 at 07:25:52PM +0100, Hrvoje Niksic wrote: Removing the offending code fixes the problem, but I'm not sure if this is the correct solution. I expect it would be more correct to remove multiple slashes only before the first occurrance of ?, but not afterwards. That's exactly what should happen. Please give us more details, if possible accompanied by `-d' output. If you'd still like details now that you know the version I was using, let me know and I'll be happy to do some tests. Yes please. For example, this is how it works for me: $ /usr/bin/wget -d http://www.xemacs.org/something?redirect=http://www.cnn.com; DEBUG output created by Wget 1.8.2 on linux-gnu. --19:23:02-- http://www.xemacs.org/something?redirect=http://www.cnn.com = `something?redirect=http:%2F%2Fwww.cnn.com' Resolving www.xemacs.org... done. Caching www.xemacs.org = 199.184.165.136 Connecting to www.xemacs.org[199.184.165.136]:80... connected. Created socket 3. Releasing 0x8080b40 (new refcount 1). ---request begin--- GET /something?redirect=http://www.cnn.com HTTP/1.0 User-Agent: Wget/1.8.2 Host: www.xemacs.org Accept: */* Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... ... The request log shows that the slashes are apparently respected. I retried a test case and found the same thing -- the slashes were respected. Then I remembered that I was using -i. Wget seems to work fine with the url on the command line; the bug only happens when the url is passed in with: cat EOF | wget -i - http://... EOF Using this method is necessary since it is the ONLY secure way I know of to do a password-protected http request from a shell script. Otherwise the password appears on the command line... Rich
Re: Bug in wget: cannot request urls with double-slash in the query string
D Richard Felker III [EMAIL PROTECTED] writes: The following code in url.c makes it impossible to request urls that contain multiple slashes in a row in their query string: [...] That code is removed in CVS, so multiple slashes now work correctly. Think of something like http://foo/bar/redirect.cgi?http://... wget translates this into: [...] Which version of Wget are you using? I think even Wget 1.8.2 didn't collapse multiple slashes in query strings, only in paths. Removing the offending code fixes the problem, but I'm not sure if this is the correct solution. I expect it would be more correct to remove multiple slashes only before the first occurrance of ?, but not afterwards. That's exactly what should happen. Please give us more details, if possible accompanied by `-d' output.
Re: Bug in wget: cannot request urls with double-slash in the query string
On Mon, Mar 01, 2004 at 03:36:55PM +0100, Hrvoje Niksic wrote: D Richard Felker III [EMAIL PROTECTED] writes: The following code in url.c makes it impossible to request urls that contain multiple slashes in a row in their query string: [...] That code is removed in CVS, so multiple slashes now work correctly. Think of something like http://foo/bar/redirect.cgi?http://... wget translates this into: [...] Which version of Wget are you using? I think even Wget 1.8.2 didn't collapse multiple slashes in query strings, only in paths. I was using 1.8.2 and noticed the problem, so I upgraded to 1.9.1 and it persisted. Removing the offending code fixes the problem, but I'm not sure if this is the correct solution. I expect it would be more correct to remove multiple slashes only before the first occurrance of ?, but not afterwards. That's exactly what should happen. Please give us more details, if possible accompanied by `-d' output. If you'd still like details now that you know the version I was using, let me know and I'll be happy to do some tests. Rich
Re: Bug in wget: cannot request urls with double-slash in the query string
D Richard Felker III [EMAIL PROTECTED] writes: Think of something like http://foo/bar/redirect.cgi?http://... wget translates this into: [...] Which version of Wget are you using? I think even Wget 1.8.2 didn't collapse multiple slashes in query strings, only in paths. I was using 1.8.2 and noticed the problem, so I upgraded to 1.9.1 and it persisted. OK. Removing the offending code fixes the problem, but I'm not sure if this is the correct solution. I expect it would be more correct to remove multiple slashes only before the first occurrance of ?, but not afterwards. That's exactly what should happen. Please give us more details, if possible accompanied by `-d' output. If you'd still like details now that you know the version I was using, let me know and I'll be happy to do some tests. Yes please. For example, this is how it works for me: $ /usr/bin/wget -d http://www.xemacs.org/something?redirect=http://www.cnn.com; DEBUG output created by Wget 1.8.2 on linux-gnu. --19:23:02-- http://www.xemacs.org/something?redirect=http://www.cnn.com = `something?redirect=http:%2F%2Fwww.cnn.com' Resolving www.xemacs.org... done. Caching www.xemacs.org = 199.184.165.136 Connecting to www.xemacs.org[199.184.165.136]:80... connected. Created socket 3. Releasing 0x8080b40 (new refcount 1). ---request begin--- GET /something?redirect=http://www.cnn.com HTTP/1.0 User-Agent: Wget/1.8.2 Host: www.xemacs.org Accept: */* Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... ... The request log shows that the slashes are apparently respected.
Bug in wget: cannot request urls with double-slash in the query string
The following code in url.c makes it impossible to request urls that contain multiple slashes in a row in their query string: else if (*h == '/') { /* Ignore empty path elements. Supporting them well is hard (where do you save http://x.com///y.html;?), and they don't bring any practical gain. Plus, they break our filesystem-influenced assumptions: allowing them would make x/y//../z simplify to x/y/z, whereas most people would expect x/z. */ ++h; } Think of something like http://foo/bar/redirect.cgi?http://... wget translates this into: http://foo/bar/redirect.cgi?http:/... and then the web server of course gives an error. Note that the problem occurs even if the slashes were url escaped, since wget unescapes them. Removing the offending code fixes the problem, but I'm not sure if this is the correct solution. I expect it would be more correct to remove multiple slashes only before the first occurrance of ?, but not afterwards. Rich
bug in wget 1.8.1/1.8.2
Hello, I use a extra file with a long list of http entries. I included this file with the -i option. After 154 downloads I got an error message: Segmentation fault. With wget 1.7.1 everything works well. Is there a new limit of lines? Regards, Dieter Drossmann
Re: bug in wget 1.8.1/1.8.2
Dieter Drossmann [EMAIL PROTECTED] writes: I use a extra file with a long list of http entries. I included this file with the -i option. After 154 downloads I got an error message: Segmentation fault. With wget 1.7.1 everything works well. Is there a new limit of lines? No, there's no built-in line limit, what you're seeing is a bug. I cannot see anything wrong inspecting the code, so you'll have to help by providing a gdb backtrace. You can get it by doing this: * Compile Wget with `-g' by running `make CFLAGS=-g' in its source directory (after configure, of course.) * Go to the src/ directory and run that version of Wget the same way you normally run it, e.g. ./wget -i FILE. * When Wget crashes, run `gdb wget core', type `bt' and mail us the resulting stack trace. Thanks for the report.
bug in wget - wget break on time msec=0
Hello, I think I found a bug in wget. My GNU wget version is 1.82 My system GNU/Debian unstable I use wget to replay our apache logfiles to a test webserver to try different tuning parameters. Wget fails to run through the logfile and give out the error message that msec =0 failed. This is the command I run #time wget -q -i replaylog -O /dev/null Here is the output of strace #time strace wget -q -i replaylog -O /dev/null read(4, HTTP/1.1 200 OK\r\nDate: Sat, 13 S..., 4096) = 4096 write(3, \377\330\377\340\0\20JFIF\0\1\1\1\0H\0H\0\0\377\354\0\21..., 3792) = 3792 gettimeofday({1063461157, 858103}, NULL) = 0 select(5, [4], NULL, [4], {900, 0}) = 1 (in [4], left {900, 0}) read(4, \377\0\344=\217\355V\\\232\363\16\221\255\336h\227\361..., 1435) = 1435 write(3, \377\0\344=\217\355V\\\232\363\16\221\255\336h\227\361..., 1435) = 1435 gettimeofday({1063461157, 858783}, NULL) = 0 time(NULL) = 1063461157 access(390564.jpg?time=1060510404, F_OK) = -1 ENOENT (No such file or directory) time(NULL) = 1063461157 select(5, [4], NULL, NULL, {0, 1}) = 0 (Timeout) time(NULL) = 1063461157 select(5, NULL, [4], [4], {900, 0}) = 1 (out [4], left {900, 0}) write(4, GET /fotos/4/390564.jpg?time=106..., 244) = 244 select(5, [4], NULL, [4], {900, 0}) = 1 (in [4], left {900, 0}) read(4, HTTP/1.1 200 OK\r\nDate: Sat, 13 S..., 4096) = 4096 write(3, \377\330\377\340\0\20JFIF\0\1\1\1\0H\0H\0\0\377\333\0C..., 3792) = 3792 gettimeofday({1063461157, 880833}, NULL) = 0 select(5, [4], NULL, [4], {900, 0}) = 1 (in [4], left {900, 0}) read(4, \343P\223\36T\4\203Rc\317\257J\4x\2165\303;o\211\256+\222..., 817) = 817 write(3, \343P\223\36T\4\203Rc\317\257J\4x\2165\303;o\211\256+\222..., 817) = 817 gettimeofday({1063461157, 874729}, NULL) = 0 time(NULL) = 1063461157 write(2, wget: retr.c:262: calc_rate: Ass..., 60wget: retr.c:262: calc_rate: Assertion `msecs = 0' failed. ) = 60 rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0 getpid()= 7106 kill(7106, SIGABRT) = 0 --- SIGABRT (Aborted) @ 0 (0) --- +++ killed by SIGABRT +++ I hope that help. Keep up the good work Kind regards Gunnar
Re: bug in wget - wget break on time msec=0
Boehn, Gunnar von [EMAIL PROTECTED] writes: I think I found a bug in wget. You did. But I believe your subject line is slightly incorrect. Wget handles 0 length time intervals (see the assert message), but what it doesn't handle are negative amounts. And indeed: gettimeofday({1063461157, 858103}, NULL) = 0 gettimeofday({1063461157, 858783}, NULL) = 0 gettimeofday({1063461157, 880833}, NULL) = 0 gettimeofday({1063461157, 874729}, NULL) = 0 As you can see, the last gettimeofday returned time *preceding* the one before it. Your ntp daemon must have chosen that precise moment to set back the system clock by ~6 milliseconds, to which Wget reacted badly. Even so, Wget shouldn't crash. The correct fix is to disallow the timer code from ever returning decreasing or negative time intervals. Please let me know if this patch fixes the problem: 2003-09-14 Hrvoje Niksic [EMAIL PROTECTED] * utils.c (wtimer_sys_set): Extracted the code that sets the current time here. (wtimer_reset): Call it. (wtimer_sys_diff): Extracted the code that calculates the difference between two system times here. (wtimer_elapsed): Call it. (wtimer_elapsed): Don't return a value smaller than the previous one, which could previously happen when system time is set back. Instead, reset start time to current time and note the elapsed offset for future calculations. The returned times are now guaranteed to be monotonically nondecreasing. Index: src/utils.c === RCS file: /pack/anoncvs/wget/src/utils.c,v retrieving revision 1.51 diff -u -r1.51 utils.c --- src/utils.c 2002/05/18 02:16:25 1.51 +++ src/utils.c 2003/09/13 23:09:13 @@ -1532,19 +1532,30 @@ # endif #endif /* not WINDOWS */ -struct wget_timer { #ifdef TIMER_GETTIMEOFDAY - long secs; - long usecs; +typedef struct timeval wget_sys_time; #endif #ifdef TIMER_TIME - time_t secs; +typedef time_t wget_sys_time; #endif #ifdef TIMER_WINDOWS - ULARGE_INTEGER wintime; +typedef ULARGE_INTEGER wget_sys_time; #endif + +struct wget_timer { + /* The starting point in time which, subtracted from the current + time, yields elapsed time. */ + wget_sys_time start; + + /* The most recent elapsed time, calculated by wtimer_elapsed(). + Measured in milliseconds. */ + long elapsed_last; + + /* Approximately, the time elapsed between the true start of the + measurement and the time represented by START. */ + long elapsed_pre_start; }; /* Allocate a timer. It is not legal to do anything with a freshly @@ -1577,22 +1588,17 @@ xfree (wt); } -/* Reset timer WT. This establishes the starting point from which - wtimer_elapsed() will return the number of elapsed - milliseconds. It is allowed to reset a previously used timer. */ +/* Store system time to WST. */ -void -wtimer_reset (struct wget_timer *wt) +static void +wtimer_sys_set (wget_sys_time *wst) { #ifdef TIMER_GETTIMEOFDAY - struct timeval t; - gettimeofday (t, NULL); - wt-secs = t.tv_sec; - wt-usecs = t.tv_usec; + gettimeofday (wst, NULL); #endif #ifdef TIMER_TIME - wt-secs = time (NULL); + time (wst); #endif #ifdef TIMER_WINDOWS @@ -1600,39 +1606,76 @@ SYSTEMTIME st; GetSystemTime (st); SystemTimeToFileTime (st, ft); - wt-wintime.HighPart = ft.dwHighDateTime; - wt-wintime.LowPart = ft.dwLowDateTime; + wst-HighPart = ft.dwHighDateTime; + wst-LowPart = ft.dwLowDateTime; #endif } -/* Return the number of milliseconds elapsed since the timer was last - reset. It is allowed to call this function more than once to get - increasingly higher elapsed values. */ +/* Reset timer WT. This establishes the starting point from which + wtimer_elapsed() will return the number of elapsed + milliseconds. It is allowed to reset a previously used timer. */ -long -wtimer_elapsed (struct wget_timer *wt) +void +wtimer_reset (struct wget_timer *wt) { + /* Set the start time to the current time. */ + wtimer_sys_set (wt-start); + wt-elapsed_last = 0; + wt-elapsed_pre_start = 0; +} + +static long +wtimer_sys_diff (wget_sys_time *wst1, wget_sys_time *wst2) +{ #ifdef TIMER_GETTIMEOFDAY - struct timeval t; - gettimeofday (t, NULL); - return (t.tv_sec - wt-secs) * 1000 + (t.tv_usec - wt-usecs) / 1000; + return ((wst1-tv_sec - wst2-tv_sec) * 1000 + + (wst1-tv_usec - wst2-tv_usec) / 1000); #endif #ifdef TIMER_TIME - time_t now = time (NULL); - return 1000 * (now - wt-secs); + return 1000 * (*wst1 - *wst2); #endif #ifdef WINDOWS - FILETIME ft; - SYSTEMTIME st; - ULARGE_INTEGER uli; - GetSystemTime (st); - SystemTimeToFileTime (st, ft); - uli.HighPart = ft.dwHighDateTime; - uli.LowPart = ft.dwLowDateTime; - return (long)((uli.QuadPart - wt-wintime.QuadPart) / 1); + return (long)(wst1-QuadPart - wst2-QuadPart) / 1; #endif +} + +/* Return the number of milliseconds
Maybe a bug in wget?
Dear Sir; We are using wget-1.8.2 and it's very convinient for our routine program. By the way, now we have a trouble with the return code from wget in case of trying to use it with -r option, When wget with -r option fails in a ftp connection, wget returns a code 0. If no -r option, it returns a code 1. We look over source programs, and find a suspicious line in ftp.c. ftp.c +1699if ((opt.ftp_glob wild) || opt.recursive || opt.timestamping) +1700 { +1701/* ftp_retrieve_glob is a catch-all function that gets called +1702 if we need globbing, time-stamping or recursion. Its +1703 third argument is just what we really need. */ +1704ftp_retrieve_glob (u, con, +1705 (opt.ftp_glob wild) ? GLOBALL : GETONE); +1706 } +1707else +1708 res = ftp_loop_internal (u, NULL, con); We guess the line 1704 should be a following line in order to return the error code back to the main function. +1704res = ftp_retrieve_glob (u, con, +1705 (opt.ftp_glob wild) ? GLOBALL : GETONE); Is this right? If we change ftp.c in this way, does any other problems not occured? Best Regards, Norihisa Fujikawa, Programming Section in Numerical Prediction Division, Japan Meteorological Agency
*** Workaround found ! *** (was: Hostname bug in wget ...)
Hi, I found a workaround for the problem described below. Using option -nh does the job for me. As the subdomains mentioned below are on the same IP as the main domain wget seems not to compare their names but the IP only. If you need more info please let me know. Have a nice weekend ! Regards Klaus --- Forwarded message follows --- From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Date sent: Thu, 4 Sep 2003 12:53:39 +0200 Subject:Hostname bug in wget ... Priority: normal ... or a silly sleepless webmaster !? Hi, Version == I use the GNU wget version 1.7 which is found on OpenBSD Release 3.3 CD. I use it on i386 architecture. How to reproduce == wget -r coolibri.com (adding the span hosts option did not improve) Problem category = There seems to be a problem with prepending wrong hostnames. Problem more detailed Between fine GETs there are lots of 404s caused by prepending wrong hostnames. That website consists of several parts distributed on several subdomains. coolibri.com cpu-kuehler.coolibri.com luefter.coolibri.com etc. Example: = wget tries to get files that are located on cpu-kuehler.coolibri.com but does not prepend cpu-kuehler.coolibri.com but coolibri.com only. Instead of (correct) http://cpu- kuehler.coolibri.com/80_Kuehler_Grafik_Grafikkarte_/80_kuehler_grafik_ grafikkarte_ .html it tries (incorrect) http://coolibri.com/80_Kuehler_Grafik_Grafikkarte_/80_kuehler_grafik_g rafikkarte_.h tml Tried my best not to waste your time - but some lack of sleep during last week was not really helpful ;-) Best regards Klaus --- End of forwarded message ---
Re: *** Workaround found ! *** (was: Hostname bug in wget ...)
[EMAIL PROTECTED] writes: I found a workaround for the problem described below. Using option -nh does the job for me. As the subdomains mentioned below are on the same IP as the main domain wget seems not to compare their names but the IP only. I believe newer versions of Wget don't do that anymore. At the time Wget was originally written, DNS-based virtual hosting was still in its infancy. Nowadays almost everyone does it, so what used to be `-nh' became the default. Either way, thanks for the report.
Hostname bug in wget ...
... or a silly sleepless webmaster !? Hi, Version == I use the GNU wget version 1.7 which is found on OpenBSD Release 3.3 CD. I use it on i386 architecture. How to reproduce == wget -r coolibri.com (adding the span hosts option did not improve) Problem category = There seems to be a problem with prepending wrong hostnames. Problem more detailed Between fine GETs there are lots of 404s caused by prepending wrong hostnames. That website consists of several parts distributed on several subdomains. coolibri.com cpu-kuehler.coolibri.com luefter.coolibri.com etc. Example: = wget tries to get files that are located on cpu-kuehler.coolibri.com but does not prepend cpu-kuehler.coolibri.com but coolibri.com only. Instead of (correct) http://cpu- kuehler.coolibri.com/80_Kuehler_Grafik_Grafikkarte_/80_kuehler_grafik_grafikkarte_ .html it tries (incorrect) http://coolibri.com/80_Kuehler_Grafik_Grafikkarte_/80_kuehler_grafik_grafikkarte_.h tml Tried my best not to waste your time - but some lack of sleep during last week was not really helpful ;-) Best regards Klaus
A small bug in wget
The bug appers if you use another output file and try to convert the url's at the same time. If you try to execute the following: wget -k -O myFile http://www.stud.ntnu.no/index.html The file will not convert, becuse wget do not locate the file index.html since the output-file is not index.html but myFile.
Bug in wget version 1.8.1
Hello. In version wget 1.8.1 i got a segfault after executing: $wget -c -r -k http://www.repairfaq.orghttp://www.repairfaq.org The bug is probably with two https in command line. I've attached strace output, but there's rather noting usefull. I have no source code of such version of wget, so i'm not able to check it out now. If you need any other, additional information from me just mail me. -- Regards Michal Byrecki wget-out Description: Binary data
Bug in wget version 1.8.1
Hello again. Matter about version wget 1.8.1 I downloaded source code of wget 1.8.1, so i can tell you more for now about this bug :) Here's more data: (gdb) set args -c -r -k http://www.repairfaq.orghttp://www.repairfaq.org (gdb) run Starting program: /home/byrek/testy/wget-1.8.1/src/wget -c -r -k http://www.repairfaq.orghttp://www.repairfaq.org Program received signal SIGSEGV, Segmentation fault. 0x0805ca6c in retrieve_tree (start_url=0x8077b98 http://www.repairfaq.orghttp://www.repairfaq.org;)at recur.c:201 201 url_enqueue (queue, xstrdup (start_url_parsed-url), NULL, 0); (gdb) * (gdb) bt #0 0x0805ca6c in retrieve_tree (start_url=0x8077b98 http://www.repairfaq.orghttp://www.repairfaq.org;)at recur.c:201 #1 0x0805a499 in main (argc=-1073743340, argv=0xbb24) at main.c:812 (gdb) Since my clock shows 3:52 AM i'm not able to analyze anything except route to my bed, so i didn't figure it out what's wrong. I hope you'll do. -- Regards Michal Byrecki
possible bug in wget?
error-description wget aborts with segmentation violation while i try to get some files recursively. wget -r -l1 http://somewhere/somewhat.htm (gdb) where #0 0x080532a2 in fnmatch () #1 0x08065788 in fnmatch () #2 0x0805e523 in fnmatch () #3 0x08060da7 in fnmatch () #4 0x0805c733 in fnmatch () #5 0x0804a295 in getsockname () i looked into the html file and determined that there where one link two times. at this position wget crashes... my configuration FreeBSD 5.0-RELEASE-p1 GNU Wget 1.8.2 (build from bsd ports collection) thanx unicorn -- +++ GMX - Mail, Messaging more http://www.gmx.net +++ NEU: Mit GMX ins Internet. Rund um die Uhr für 1 ct/ Min. surfen!
RE: Bug with wget ? I need help.
Try telnet www.sosi.cnrs.fr 80 if it connects type GET / HTTP/1.0 followed by two newlines. If you don't get the output of the webserver you probably have a routing problem or something else. Heiko -- -- PREVINET S.p.A.[EMAIL PROTECTED] -- Via Ferretto, 1ph x39-041-5907073 -- I-31021 Mogliano V.to (TV) fax x39-041-5907472 -- ITALY -Original Message- From: Cédric Rosa [mailto:[EMAIL PROTECTED]] Sent: Friday, June 21, 2002 4:37 PM To: [EMAIL PROTECTED] Subject: Bug with wget ? I need help. Hello, First, scuse my english but I'm french. When I try with wget (v 1.8.1) to download an url which is behind a router, the software wait for ever even if I've specified a timeout. With ethereal, I've seen that there is no response from the server (ACK never appears). Here is the debug output: rosa@r1:~/htmlparser1.1/lib$ wget www.sosi.cnrs.fr --16:30:54-- http://www.sosi.cnrs.fr/ = `index.html' Resolving www.sosi.cnrs.fr... done. Connecting to www.sosi.cnrs.fr[193.55.87.37]:80... Thanks by advance for your help. Cedric Rosa.
Fwd: Bug with wget ? I need help.
It seems to be the default timer that can't be overwritten. --17:12:19-- http://www.sosi.cnrs.fr/ = `index.html' Resolving www.sosi.cnrs.fr... done. Connecting to www.sosi.cnrs.fr[193.55.87.37]:80... failed: Connection timed out. Giving up. --17:26-- Someone can reproduce this problem ? Date: Fri, 21 Jun 2002 16:37:02 +0200 To: [EMAIL PROTECTED] From: Cédric Rosa [EMAIL PROTECTED] Subject: Bug with wget ? I need help. Hello, First, scuse my english but I'm french. When I try with wget (v 1.8.1) to download an url which is behind a router, the software wait for ever even if I've specified a timeout. With ethereal, I've seen that there is no response from the server (ACK never appears). Here is the debug output: rosa@r1:~/htmlparser1.1/lib$ wget www.sosi.cnrs.fr --16:30:54-- http://www.sosi.cnrs.fr/ = `index.html' Resolving www.sosi.cnrs.fr... done. Connecting to www.sosi.cnrs.fr[193.55.87.37]:80... Thanks by advance for your help. Cedric Rosa.
Re: Bug with wget ? I need help.
Cédric Rosa wrote: Hello, First, scuse my english but I'm french. When I try with wget (v 1.8.1) to download an url which is behind a router, the software wait for ever even if I've specified a timeout. With ethereal, I've seen that there is no response from the server (ACK never appears). This a documented behavior, because of programming issues the timeout does not cover the connection but only response after a connection has been established. For version 1.9 the timeout option will also cover the connection. http://cvs.sunsite.dk/viewcvs.cgi/*checkout*/wget/NEWS?rev=HEADcontent-type=text/plain Here is the debug output: rosa@r1:~/htmlparser1.1/lib$ wget www.sosi.cnrs.fr --16:30:54-- http://www.sosi.cnrs.fr/ = `index.html' Resolving www.sosi.cnrs.fr... done. Connecting to www.sosi.cnrs.fr[193.55.87.37]:80... Thanks by advance for your help. Cedric Rosa. -- Med venlig hilsen / Kind regards Hack Kampbjørn
Re: Bug with wget ? I need help.
thanks for your help :) I'm installing version 1.9 to check. I think this update may solve my problem. Cedric Rosa. - Original Message - From: Hack Kampbjørn [EMAIL PROTECTED] To: Cédric Rosa [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Friday, June 21, 2002 7:27 PM Subject: Re: Bug with wget ? I need help. Cédric Rosa wrote: Hello, First, scuse my english but I'm french. When I try with wget (v 1.8.1) to download an url which is behind a router, the software wait for ever even if I've specified a timeout. With ethereal, I've seen that there is no response from the server (ACK never appears). This a documented behavior, because of programming issues the timeout does not cover the connection but only response after a connection has been established. For version 1.9 the timeout option will also cover the connection. http://cvs.sunsite.dk/viewcvs.cgi/*checkout*/wget/NEWS?rev=HEADcontent-type =text/plain Here is the debug output: rosa@r1:~/htmlparser1.1/lib$ wget www.sosi.cnrs.fr --16:30:54-- http://www.sosi.cnrs.fr/ = `index.html' Resolving www.sosi.cnrs.fr... done. Connecting to www.sosi.cnrs.fr[193.55.87.37]:80... Thanks by advance for your help. Cedric Rosa. -- Med venlig hilsen / Kind regards Hack Kampbjørn
(fwd) Bug#149075: wget: option for setting tcp window size
Hello, I got this feature request: http://bugs.debian.org/149075 - Forwarded message from Erno Kuusela [EMAIL PROTECTED] - hello, it would be really useful to be able to set the tcp window size for wget, since the default window size can be much too small for long latency links. also setting the window size to less than the default would in effect work as a rate limiter. the window size can be set with the SOL_SOCKET/SO_RCVBUF socket option. - End forwarded message - -- Noèl Köthe
Re: small bug in wget manpage: --progress
Noel Koethe [EMAIL PROTECTED] writes: the wget 1.8.1 manpage tells me: --progress=type Select the type of the progress indicator you wish to use. Legal indicators are ``dot'' and ``bar''. The ``dot'' indicator is used by default. It traces the retrieval by printing dots on the screen, each dot representing a fixed amount of downloaded data. But it looks like the default is bar. Yes. Thanks for the report; I'm about to apply this fix. 2002-04-15 Hrvoje Niksic [EMAIL PROTECTED] * wget.texi (Download Options): Fix the documentation of `--progress'. Index: doc/wget.texi === RCS file: /pack/anoncvs/wget/doc/wget.texi,v retrieving revision 1.64 diff -u -r1.64 wget.texi --- doc/wget.texi 2002/04/13 22:44:16 1.64 +++ doc/wget.texi 2002/04/15 20:52:28 @@ -625,10 +625,15 @@ Select the type of the progress indicator you wish to use. Legal indicators are ``dot'' and ``bar''. -The ``dot'' indicator is used by default. It traces the retrieval by -printing dots on the screen, each dot representing a fixed amount of -downloaded data. +The ``bar'' indicator is used by default. It draws an ASCII progress +bar graphics (a.k.a ``thermometer'' display) indicating the status of +retrieval. If the output is not a TTY, the ``dot'' bar will be used by +default. +Use @samp{--progress=dot} to switch to the ``dot'' display. It traces +the retrieval by printing dots on the screen, each dot representing a +fixed amount of downloaded data. + When using the dotted retrieval, you may also set the @dfn{style} by specifying the type as @samp{dot:@var{style}}. Different styles assign different meaning to one dot. With the @code{default} style each dot @@ -639,11 +644,11 @@ files---each dot represents 64K retrieved, there are eight dots in a cluster, and 48 dots on each line (so each line contains 3M). -Specifying @samp{--progress=bar} will draw a nice ASCII progress bar -graphics (a.k.a ``thermometer'' display) to indicate retrieval. If the -output is not a TTY, this option will be ignored, and Wget will revert -to the dot indicator. If you want to force the bar indicator, use -@samp{--progress=bar:force}. +Note that you can set the default style using the @code{progress} +command in @file{.wgetrc}. That setting may be overridden from the +command line. The exception is that, when the output is not a TTY, the +``dot'' progress will be favored over ``bar''. To force the bar output, +use @samp{--progress=bar:force}. @item -N @itemx --timestamping
Re: Debian wishlist bug 21148 - wget doesn't allow selectivitybased on mime type
I believe this is already on the todo list. However, this is made harder by the fact that, to implement this kind of reject, you have to start downloading the file. This is very different from the filename-based rejection, where the decision can be made at a very early point in the download process.
Re: debian bug 32712 - wget -m sets atimet to remote mtime.
Good point there. I wonder... is there a legitimate reason to require atime to be set to the mtime time? If not, we could just make the change without the new option. In general I'm careful not to add new options unless they're really necessary.
Re: Debian bug 55145 - wget gets confused by redirects
Guillaume Morin [EMAIL PROTECTED] writes: If wget fetches a url which redirects to another host, wget retrieves the file, and there's nothing that can be done to turn that off. So, if you do wget -r on a machine that happens to have a redirect to www.yahoo.com you'll wind up trying to pull down a big chunk of yahoo. Hmm. Are you sure? Wget 1.8.1 is trying hard to restrict following redirections by applying the same rules normally used for following links. Downloading a half of Yahoo! because someone redirects to www.yahoo.com is not intended to happen. I tried to reproduce it by creating a page that redirects to www.yahoo.com, but Wget behaved correctly: $ wget -r -l0 http://muc.arsdigita.com:2005/test.tcl --19:13:53-- http://muc.arsdigita.com:2005/test.tcl = `muc.arsdigita.com:2005/test.tcl' Resolving muc.arsdigita.com... done. Connecting to muc.arsdigita.com[212.84.246.68]:2005... connected. HTTP request sent, awaiting response... 302 Found Location: http://www.yahoo.com [following] --19:13:53-- http://www.yahoo.com/ = `www.yahoo.com/index.html' Resolving www.yahoo.com... done. Connecting to www.yahoo.com[64.58.76.223]:80... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] [ = ] 16,82922.39K/s 19:13:55 (22.39 KB/s) - `www.yahoo.com/index.html' saved [16829] FINISHED --19:13:55-- Downloaded: 16,829 bytes in 1 files Guillaume, exactly how have you reproduced the problem?
suspected bug in WGET 1.8.1
I'm using the NT port of WGET 1.8.1. FTP retrieval of files works fine, retrieval of directory listings fails. The problem happens under certain conditions when connecting to OS2 FTP servers. For example, if the current directory on the FTP server at login time is e:/abc, the command wget ftp://userid:password@ipaddr/g:\def\test.doc; works fine to retrieve the file, but the command wget ftp://userid:password@ipaddr/g:\def\; fails to retrieve the directory listing. For what it's worth, g:\def/, g:/def\ and g:/def/ also fail. Matt Jackson (919) 254-4547 [EMAIL PROTECTED]
small bug in wget manpage: --progress
Hello, the wget 1.8.1 manpage tells me: --progress=type Select the type of the progress indicator you wish to use. Legal indicators are ``dot'' and ``bar''. The ``dot'' indicator is used by default. It traces the retrieval by printing dots on the screen, each dot representing a fixed amount of downloaded data. But it looks like the default is bar. thx. -- Noèl Köthe
Debian bug 113281 - wget doesn't wait when retrying
Hi, I am forwarding Debian bug 113281 http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=113281repeatmerged=yes It still applies to 1.8.1. I am sure it is a bug though wget doesn't wait when retrying to connect to an FTP server. Not sure if this affects HTTP downloads. In the case shown below, I was attempting to download from a server that had reached its user limit. wget would retry every second or two, eventually resulting in my system being temporarily banned from the server (the last attempt reflects this change). I tried this with both `wget -w 40 url' (should be the same as `wget --wait=40 url') and `wget --waitretry=40 url' [mike@3po][~/download]$ wget --waitretry=40 ftp://ftp.idsoftware.com/idstuff/wolf/linux/wolfmptest-0.7.16-1.x86.run --15:02:37-- ftp://ftp.idsoftware.com/idstuff/wolf/linux/wolfmptest-0.7.16-1.x86.run = `wolfmptest-0.7.16-1.x86.run' Connecting to ftp.idsoftware.com:21... connected! Logging in as anonymous ... The server refuses login. Retrying. --15:02:38-- ftp://ftp.idsoftware.com/idstuff/wolf/linux/wolfmptest-0.7.16-1.x86.run (try: 2) = `wolfmptest-0.7.16-1.x86.run' Connecting to ftp.idsoftware.com:21... connected! Logging in as anonymous ... The server refuses login. Retrying. --15:02:41-- ftp://ftp.idsoftware.com/idstuff/wolf/linux/wolfmptest-0.7.16-1.x86.run (try: 3) = `wolfmptest-0.7.16-1.x86.run' Connecting to ftp.idsoftware.com:21... connected! Logging in as anonymous ... The server refuses login. Retrying. --15:02:45-- ftp://ftp.idsoftware.com/idstuff/wolf/linux/wolfmptest-0.7.16-1.x86.run (try: 4) = `wolfmptest-0.7.16-1.x86.run' Connecting to ftp.idsoftware.com:21... connected! Logging in as anonymous ... Error in server response, closing control connection. Retrying. Please keep [EMAIL PROTECTED] CC'ed. -- Guillaume Morin [EMAIL PROTECTED] I am the saddest kid in grade number two (Lisa Simpsons)
Debian wishlist bug 21148 - wget doesn't allow selectivity based on mime type
Hi, I am forwarding Debian wishlist bug 21148 http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=21148repeatmerged=yes While wget allows me to include/exclude documents based on their extension, it doesn't allow me to do the same based on mime type (for example, if I only want to save text/* documents). Please keep [EMAIL PROTECTED] CC'ed. -- Guillaume Morin [EMAIL PROTECTED] Justice is lost, Justice is raped, Justice is done. (Metallica)
Debian bug 117774 - wget returns 0 even when failing when using wildcards
Hi, I am forwarding you this bug. I can reproduce this on 1.8.1 http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=117774repeatmerged=yes --- wget seems to always return 0 as return code even when it fails, but only AFAIK when using some wildcard char in the URL. For example: spiney:~ $ wget --use-proxy=off ftp://this.should.be.enough.spiney.org/README?; --14:18:18-- ftp://this.should.be.enough.spiney.org/README? = `.listing' Connecting to this.should.be.enough.spiney.org:21... this.should.be.enough.spiney.org: Host not found unlink: No such file or directory spiney:~ $ echo $? 0 spiney:~ $ wget --use-proxy=off ftp://this.should.be.enough.spiney.org/README*; --14:19:12-- ftp://this.should.be.enough.spiney.org/README* = `.listing' Connecting to this.should.be.enough.spiney.org:21... this.should.be.enough.spiney.org: Host not found unlink: No such file or directory spiney:~ $ echo $? 0 spiney:~ $ wget --use-proxy=off ftp://this.should.be.enough.spiney.org/README; --14:19:21-- ftp://this.should.be.enough.spiney.org/README = `README' Connecting to this.should.be.enough.spiney.org:21... this.should.be.enough.spiney.org: Host not found spiney:~ $ echo $? 1 spiney:~ $ -- Guillaume Morin [EMAIL PROTECTED] Marry me girl, be my only fairy to the world (RHCP)
Re: bug in wget 1.8
Vladimir Volovich [EMAIL PROTECTED] writes: while downloading some file (via http) with wget 1.8, i got an error: assertion failed: p - bp-buffer = bp-width, file progress.c, line 673 Abort (core dumped) Thanks for the report. It's a known problem in 1.8, fixed by this patch. Index: src/progress.c === RCS file: /pack/anoncvs/wget/src/progress.c,v retrieving revision 1.21 retrieving revision 1.22 diff -u -r1.21 -r1.22 --- src/progress.c 2001/12/09 01:24:40 1.21 +++ src/progress.c 2001/12/09 04:51:40 1.22 @@ -647,7 +647,7 @@ /* Hours not printed: pad with three spaces (two digits and colon). */ APPEND_LITERAL ( ); - else if (eta_hrs = 10) + else if (eta_hrs 10) /* Hours printed with one digit: pad with one space. */ *p++ = ' '; else
bug in wget rate limit feature
Hi, Today I downloaded the new wget release (1.8) (I'm a huge fan of the util btw ;p ) and have been trying out the rate-limit feature. When I run: wget --limit-rate=20k http://www.planetmirror.com/pub/debian-cd/2.1_r4/i386/binary-i386-1.iso I get a core dump with the following output --10:16:54-- http://www.planetmirror.com/pub/debian-cd/2.1_r4/i386/binary-i386-1.iso = `binary-i386-1.iso.1' Resolving twist... done. Connecting to twist[167.123.1.1]:8080... connected. Proxy request sent, awaiting response... 200 OK Length: 639,348,736 [application/octet-stream] 0% [ ] 64,30619.26K/s ETA 9:00:08 assertion p - bp-buffer = bp-width failed: file progress.c, line 673 Abort (core dumped) twist is our web proxy (running squid) The funny thing is I can snarf the whole intranet using the -m and rate limit options with no bugs at all. A huge iso though just makes it fall over. I'm running FreeBSD 4.1 (which until it hits ports may be a problem). I can't test a non-proxied linux pc from here to see if the same thing happens when grabbing an iso. keep up the good work with wget!
Re: bug in wget rate limit feature
[EMAIL PROTECTED] writes: Today I downloaded the new wget release (1.8) (I'm a huge fan of the util btw ;p ) and have been trying out the rate-limit feature. [...] assertion p - bp-buffer = bp-width failed: file progress.c, line 673 Thanks for the report. The bug shows with downloads whose ETA is 10 or more hours, and is trivially fixed by this patch, already applied to the CVS: Index: progress.c === RCS file: /pack/anoncvs/wget/src/progress.c,v retrieving revision 1.21 retrieving revision 1.22 diff -u -r1.21 -r1.22 --- progress.c 2001/12/09 01:24:40 1.21 +++ progress.c 2001/12/09 04:51:40 1.22 @@ -647,7 +647,7 @@ /* Hours not printed: pad with three spaces (two digits and colon). */ APPEND_LITERAL ( ); - else if (eta_hrs = 10) + else if (eta_hrs 10) /* Hours printed with one digit: pad with one space. */ *p++ = ' '; else
Bug in wget 1.7 prev init.c: wgetrc environment var
In wget 1.7 and 1.6, if the WGETRC environment variable is set but the file specified is inaccessible, the message: wget: (null): No such file or directory. is displayed and the program exits with status 1. Debugging traces the problem to the following function in init.c (ca. line 261) /* Return the path to the user's .wgetrc. This is either the value of `WGETRC' environment variable, or `$HOME/.wgetrc'. If the `WGETRC' variable exists but the file does not exist, the function will exit(). */ static char * wgetrc_file_name (void) { char *env, *home; char *file = NULL; /* Try the environment. */ env = getenv (WGETRC); if (env *env) { if (!file_exists_p (env)) { fprintf (stderr, %s: %s: %s.\n, exec_name, file, strerror (errno)); exit (1); } return xstrdup (env); } where the error message is printed Firstly, file is a null pointer at the time that this error message is printed; env is the correct pointer to use here. Secondly, there is no explanation of why the program is looking for this file. A possible fix is as follows: 278c278 fprintf (stderr, %s: Unable to access WGETRC specified in environment: %s: %s.\n, exec_name, env, strerror (errno)); --- fprintf (stderr, %s: %s: %s.\n, exec_name, file, strerror (errno)); the resultant output is now (when WGETRC is set to c:\.wgetrc, and this file doesn't exist): wget: Unable to access WGETRC specified in environment: c:\.wgetrc: No such file or directory. Note: debugging and patching was done with version 1.6 source. I have upgraded my executable to 1.7 and the bug still exists, but I haven't obtained the source code to see if there are any changes in this function between ver 1.6 and 1.7. Warm Regards, Chris
Bug in wget 1.7
Hello. I have discovered a bug in wget 1.7 When I try to get thist page: http://www.lehele.de/ this error occurs: - wget -d -r -l 1 www.lehele.de DEBUG output created by Wget 1.7 on linux-gnu. parseurl (www.lehele.de) - host www.lehele.de - opath - dir - file - ndir newpath: / Checking for www.lehele.de in host_name_address_map. Checking for www.lehele.de in host_slave_master_map. First time I hear about www.lehele.de by that name; looking it up. Caching www.lehele.de - 212.227.118.88 Checking again for www.lehele.de in host_slave_master_map. --19:59:51-- http://www.lehele.de/ = `www.lehele.de/index.html' Verbindungsaufbau zu www.lehele.de:80... Found www.lehele.de in host_name_address_map: 212.227.118.88 Created fd 3. verbunden! ---request begin--- GET / HTTP/1.0 User-Agent: Wget/1.7 Host: www.lehele.de Accept: */* Connection: Keep-Alive ---request end--- HTTP Anforderung gesendet, auf Antwort wird gewartet... HTTP/1.1 200 OK Date: Wed, 03 Oct 2001 18:03:52 GMT Server: Apache/1.3.14 (Unix) Connection: close Content-Type: text/html Länge: nicht spezifiziert [text/html] 0K ...@ 332.21 B/s Closing fd 3 20:00:15 (332.21 B/s) - »www.lehele.de/index.html« gespeichert [3830] parseurl (www.lehele.de) - host www.lehele.de - opath - dir - file - ndir newpath: / Loaded www.lehele.de/index.html (size 3830). Speicherzugriffsfehler (core dumped) The file index.html is saved an complete in directory www.lehele.de. If I call wget without recursion then everything is ok, but when i try to go deeper wget is crashing. -Thomas
Re: Size bug in wget-1.7
On 17 Aug 2001, at 11:41, Dave Turner wrote: On Fri, 17 Aug 2001, Dave Turner wrote: By way of a hack I have used the SIZE command, not supported by RFC959 but still accepted by many of the servers I use, to get the size of the file. If that fails then it falls back on the old method. The patch is attached, in what I hope is an acceptable format. Guess who forgot the attachment? Sorry! Nice patch. I think it can be improved by only sending the SIZE command if the file already exists (has a non-zero size, i.e. restval parameter in function getftp is non-zero). I have attached a slightly modified version of Dave Turner [EMAIL PROTECTED]' s patch to add the non-zero restval test, remove an unused variable 's' from function ftp_size, and I have created the patch using the command cvs diff -uR to make it relative to the current CVS sources for wget-1.7.1-pre1. Here is a ChangeLog entry for it: 2001-08-21 Dave Turner [EMAIL PROTECTED] * ftp-basic.c (ftp_size): New function to send non-standard SIZE command to server to request file size. * ftp.h (ftp_size): Export it. * ftp.c (getftp): Use new ftp_size function if restoring transfer of a file with unknown size. wget-1.7.1-pre1-size-fix.patch
Size bug in wget-1.7
Not sure if this is wget's fault or a broken server, but it happens on a lot of servers so maybe it should be handled better. The bug seems to manifest itself when resuming an FTP transfter and the length is unauthoritative. The reported total length is in fact the remaining length (i.e. the total length minus the length downloaded); the reported remaining length is the total length minus twice the length downloaded, which goes negative once you've downloaded 50% of the file! For example, the actual size of kdebase-2.2.tar.bz2 is 10917131 bytes, and this is what my log says when the transfer was resumed at 56%: --11:38:44-- ftp://ftp.sourceforge.net/pub/mirrors/kde/stable/2.2/src/kdebase-2.2.tar.bz2 = `kdebase-2.2.tar.bz2' Connecting to ftp.sourceforge.net:21... connected! Logging in as anonymous ... Logged in! == SYST ... done.== PWD ... done. == TYPE I ... done. == CWD /pub/mirrors/kde/stable/2.2/src ... done. == PORT ... done.== REST 6131968 ... done. == RETR kdebase-2.2.tar.bz2 ... done. Length: 4,785,163 [-1,346,805 to go] (unauthoritative) [ skipping 5760K ] 5760K 131% @ 2.07 KB/s As an artefact of this bug, the percentage downloaded is also incorrect (also shown here) Yours, Dave Turner [EMAIL PROTECTED]
bug in wget-1.7/doc/Makefile.in
hi, guess there is a bug in the Makefile.in of the doc directory the wget.1 couldn't be found if --srcdir option is used ... regards michael 8- / diff doc/Makefile.in doc/Makefile.in_new 118c118 $(INSTALL_DATA) $(srcdir)/$(MAN) $(DESTDIR)$(mandir)/man$(manext)/$(MAN) --- $(INSTALL_DATA) $(MAN) $(DESTDIR)$(mandir)/man$(manext)/$(MAN)
Serious bug in Wget/1.6
Hi! I found the following in the log file of piology.org: 202.108.68.179 - - [15/Jul/2001:10:50:19 +0200] GET /3.14/ HTTP/1.0 404 2332 http://www.go2net.com/useless/useless/pi.html; Wget/1.6 202.108.68.179 - - [15/Jul/2001:12:49:38 +0200] GET /elmi/ HTTP/1.0 404 2316 http://piology.org/photo.html; Wget/1.6 202.108.68.179 - - [15/Jul/2001:13:08:45 +0200] GET /cgi-bin/events.cgi?act=1Event=3 HTTP/1.0 404 2347 http://piology.org/photo.html; Wget/1.6 If you look at the referrers you see nothing of this is true (all those pages haven't changed recently). I guess the problem is that wget ignores virtual hosts. If it notes that two machines have the same IP it changed absolute links to relative ones on the presumed host. pi