Re: WGET bug...

2008-07-11 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

HARPREET SAWHNEY wrote:
> Hi,
> 
> Thanks for the prompt response.
> 
> I am using
> 
> GNU Wget 1.10.2
> 
> I tried a few things on your suggestion but the problem remains.
> 
> 1. I exported the cookies file in Internet Explorer and specified
> that in the Wget command line. But same error occurs.
> 
> 2. I have an open session on the site with my username and password.
> 
> 3. I also tried running wget while I am downloading a file from the
> IE session on the site, but the same error.

Sounds like you'll need to get the appropriate cookie by using Wget to
login to the website. This requires site-specific information from the
user-login form page, though, so I can't help you without that.

If you know how to read some HTML, then you can find the HTML form used
for posting username/password stuff, and use

wget --keep-session-cookies --save-cookies=cookies.txt \
- --post-data='username=foo&password=bar' ACTION

Where ACTION is the value of the form's action field, USERNAME and
PASSWORD (and possibly further required values) are field names from the
HTML form, and FOO and BAR is the username/password.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer,
and GNU Wget Project Maintainer.
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFId+w97M8hyUobTrERAmLsAJ91231iGeO/albrgRuuUCRp8zFcnwCgiX3H
fDp2J2oTBKlxW17eQ2jaCAA=
=Khmi
-END PGP SIGNATURE-


Re: WGET bug...

2008-07-11 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

HARPREET SAWHNEY wrote:
> Hi,
> 
> I am getting a strange bug when I use wget to download a binary file
> from a URL versus when I manually download.
> 
> The attached ZIP file contains two files:
> 
> 05.upc --- manually downloaded
> dum.upc--- downloaded through wget
> 
> wget adds a number of ascii characters to the head of the file and seems
> to delete a similar number from the tail.
> 
> So the file sizes are the same but the addition and deletion renders
> the file useless.
> 
> Could you please direct me on if I should be using some specific
> option to avoind this problem?

In the future, it's useful to mention which version of Wget you're using.

The problem you're having is that the server is adding the extra HTML at
the front of your session, and then giving you the file contents anyway.
It's a bug in the PHP code that serves the file.

You're getting this extra content because you are not logged in when
you're fetching it. You need to have Wget send a cookie with an
login-session information, and then the server will probably stop
sending the corrupting information at the head of the file. The site
does not appear to use HTTP's authentication mechanisms, so the
<[EMAIL PROTECTED]> bit in the URL doesn't do you any good. It uses
Forms-and-cookies authentication.

Hopefully, you're using a browser that stores its cookies in a text
format, or that is capable of exporting to a text format. In that case,
you can just ensure that you're logged in in your browser, and use the
- --load-cookies= option to Wget to use the same session
information.

Otherwise, you'll need to use --save-cookies with Wget to simulate the
login form post, which is tricky and requires some understanding of HTML
Forms.

- --
HTH,
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer,
and GNU Wget Project Maintainer.
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFId9Vy7M8hyUobTrERAjCWAJ9niSjC5YdBDNcAbnBFWZX6D8AO7gCeM8nE
i8jn5i5Y6wLX1g3Q2hlDgcM=
=uOke
-END PGP SIGNATURE-


Re: wget bug?

2007-07-09 Thread Matthias Vill

Mauro Tortonesi schrieb:

On Mon, 9 Jul 2007 15:06:52 +1200
[EMAIL PROTECTED] wrote:


wget under win2000/win XP
I get "No such file or directory" error messages when using the follwing 
command line.


wget -s --save-headers 
"http://www.nndc.bnl.gov/ensdf/browseds.jsp?nuc=%1&class=Arc";

%1 = 212BI
Any ideas?


hi nikolaus,

in windows, you're supposed to use %VARIABLE_NAME% for variable substitution. 
try using %1% instead of %1.



AFAIK it's ok to use %1, because it is a special case. Also the error 
would be a 404 or some wget error in that case the variable gets 
substituted in a wrong way or not? (actually even than you get a 200 
response with that url)


I just tried using the command inside a batch-file and came across 
another problem: You used a lowercase -s wich is not recognized by my 
wget-version, but a uppercase -S is. i guess you should change that.


I would guess wget is not in your PATH.
Try using "c:\path\to\the dircetory\wget.exe" instead of just wget.

If this too does not hel at explicit "--restrict-file-names=windows" to 
your options, so wget does not try to use the ? inside a filename. 
(normally not needed)


So a should-work-for-all-means-version is

"c:\path\wget.exe" -S --save-headers --restrict-file-names=windows 
"http://www.nndc.bnl.gov/ensdf/browseds.jsp?nuc=%1&class=Arc";


Of course just one line, but my dump mail-editor wrapped it.

Greetings
Matthias


Re: wget bug?

2007-07-09 Thread Mauro Tortonesi
On Mon, 9 Jul 2007 15:06:52 +1200
[EMAIL PROTECTED] wrote:

> wget under win2000/win XP
> I get "No such file or directory" error messages when using the follwing 
> command line.
> 
> wget -s --save-headers 
> "http://www.nndc.bnl.gov/ensdf/browseds.jsp?nuc=%1&class=Arc";
> 
> %1 = 212BI
> Any ideas?

hi nikolaus,

in windows, you're supposed to use %VARIABLE_NAME% for variable substitution. 
try using %1% instead of %1.

-- 
Mauro Tortonesi <[EMAIL PROTECTED]>


RE: wget bug

2007-05-24 Thread Tony Lewis
Highlord Ares wrote:

 

> it tries to download web pages named similar to

>  
http://site.com?variable=yes&mode=awesome

 

Since "&" is a reserved character in many command shells, you need to quote
the URL on the command line:

 

wget " 
http://site.com?variable=yes&mode=awesome";

 

Tony

 



RE: wget bug

2007-05-23 Thread Willener, Pat
This does not look like a valid URL to me - shouldn't there be a slash at the 
end of the domain name?
 
Also, when talking about a bug (or anything else), it is always helpful if you 
specify the wget version (number).



From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Highlord Ares
Sent: Thursday, May 24, 2007 11:41
To: [EMAIL PROTECTED]
Subject: wget bug


when I run wget on a certain sites, it tries to download web pages named 
similar to http://site.com?variable=yes&mode=awesome.  However, wget isn't 
saving any of these files, no doubt because of some file naming issue?  this 
problem exists in both the Windows & unix versions. 

hope this helps



Re: wget bug in finding files after disconnect

2006-11-18 Thread Georg Schulte Althoff
Paul Bickerstaff <[EMAIL PROTECTED]> wrote in 
news:[EMAIL PROTECTED]:

> I'm using wget version "GNU Wget 1.10.2 (Red Hat modified)" on a fedora
> core5 x86_64 system (standard wget rpm). I'm also using version 1.10.2b
> on a WinXP laptop. Both display the same faulty behaviour which I don't
> believe was present in earlier versions of wget that I've used.
> 
> When the internet connection disconnects wget automatically tries to
> redownload the file (starting from where it was disconnected).
> 
> The problem is that it is consistently failing to find the file. The
> following output shows what is happening.
> 
> wget -c ftp://bio-mirror.jp.apan.net/pub/biomirror/blast/nr.*.tar.gz
[...]
> Retrying.
> 
> --14:13:54--
> ftp://bio-mirror.jp.apan.net/pub/biomirror/blast/nr.00.tar.gz
>   (try: 2) => `nr.00.tar.gz'
> Connecting to bio-mirror.jp.apan.net|150.26.2.58|:21... connected.
> Logging in as anonymous ... Logged in!
> ==> SYST ... done.==> PWD ... done.
> ==> TYPE I ... done.  ==> CWD not required.
> ==> PASV ... done.==> REST 315859600 ... done.
> ==> RETR nr.00.tar.gz ...
> No such file `nr.00.tar.gz'.
> 
[...]
> 
> I have checked and the files are there and have not moved or altered in
> any way.
> 
> I believe that the problem is almost certainly associated with the
> logged item "CWD not required" after a reconnect.
> 
> Cheers

I encountered the same situation and solved it this way:
Call wget with -B (--base) option to set base directory
and with -i (--input-file) to point to a file containing
the relative URLs you want to download.

Not tested, but it should look like this
  wget 
-c 
--base=ftp://bio-mirror.jp.apan.net/pub/biomirror/blast/
--input-file=urls.txt
with urls.txt containing
  nr.*.tar.gz

Hope it helps you.

Georg



Re: [WGET BUG] - Can not retreive image from cacti

2006-06-19 Thread Steven M. Schweda
>From Thomas GRIMONET:

> [...]
> File is created but it is empty.

   That's normal with "-O" if Wget fails for some reason.

   It might help the diagnosis to see the actual Wget command instead of
the code which generates the Wget commsnd.  If that doesn't show you
anything, then adding "-d" to the command might help more.

   Normally, when Wget fails for some reason, it emits an error message. 
Where's yours?



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: Wget Bug: recursive get from ftp with a port in the url fails

2006-04-13 Thread Hrvoje Niksic
"Jesse Cantara" <[EMAIL PROTECTED]> writes:

> A quick resolution to the problem is to use the "-nH" command line
> argument, so that wget doesn't attempt to create that particular
> directory. It appears as if the problem is with the creation of a
> directory with a ':' in the name, which I cannot do outside of wget
> either. I am not sure if that is specific to my filesystem, or to
> linux in general.

It's not specific to Linux, so it must be your file system.  Are you
perhaps running Wget on a FAT32-mounted partition?  If so, try using
--restrict-file-names=windows.

Thanks for the report.


Re: wget BUG: ftp file retrieval

2005-11-26 Thread Steven M. Schweda
From: Hrvoje Niksic

> [...]  On Unix-like FTP servers, the two methods would
> be equivalent.

   Right.  So I resisted temptation, and kept the two-step CWD method in
my code for only a VMS FTP server.  My hope was that some one would look
at the method, say "That's a good idea", and change the "if" to let it
be used everywhere.

   Of course, I'm well known to be delusional in these matters.



   Steven M. Schweda   (+1) 651-699-9818
   382 South Warwick Street[EMAIL PROTECTED]
   Saint Paul  MN  55105-2547


Re: wget BUG: ftp file retrieval

2005-11-26 Thread Hrvoje Niksic
[EMAIL PROTECTED] (Steven M. Schweda) writes:

>> and adding it fixed many problems with FTP servers that log you in
>> a non-/ working directory.
>
> Which of those problems would _not_ be fixed by my two-step CWD for
> a relative path?  That is: [...]

That should work too.  On Unix-like FTP servers, the two methods would
be equivalent.

Thanks for the suggestion.  I realized your patch contained
improvements for dealing with VMS FTP servers, but I somehow managed
to miss this explanation.


Re: wget BUG: ftp file retrieval

2005-11-25 Thread Steven M. Schweda
From: Hrvoje Niksic

> Prepending is already there,

   Yes, it certainly is, which is why I had to disable it in my code for
VMS FTP servers.

>  and adding it fixed many problems with
> FTP servers that log you in a non-/ working directory.

   Which of those problems would _not_ be fixed by my two-step CWD for a
relative path?  That is:

  1. CWD to the string which the server reported in its initial PWD
 response.

  2. CWD to the relative path in the URL ("A/B" in our current
 example).

On a VMS server, the first path is probably pure VMS, so it works, and
the second path is pure UNIX, so it also works (on all the servers I've
tried, at least).  As I remark in the (seldom-if-ever-read) comments in
my "src/ftp.c", I see no reason why this scheme would fail on any
reasonable server.  But I'm always open to a good argument, especially
if it includes a demonstration of a good counter-example.

   This (in my opinion, stinking-bad) prepending code is the worst part
of what makes the current (not-mine) VMS FTP server code so awful. 
(Running a close second is the part which discards the device name from
the initial PWD response, which led to a user complaint in this forum a
while back, involving an inability to specify a different device in a
URL.)



   Steven M. Schweda   (+1) 651-699-9818
   382 South Warwick Street[EMAIL PROTECTED]
   Saint Paul  MN  55105-2547


Re: wget BUG: ftp file retrieval

2005-11-25 Thread Hrvoje Niksic
Daniel Stenberg <[EMAIL PROTECTED]> writes:

> On Fri, 25 Nov 2005, Steven M. Schweda wrote:
>
>>   Or, better yet, _DO_ forget to prepend the trouble-causing $CWD to
>> those paths.
>
> I agree. What good would prepending do?

Prepending is already there, and adding it fixed many problems with
FTP servers that log you in a non-/ working directory.


Re: wget BUG: ftp file retrieval

2005-11-25 Thread Daniel Stenberg

On Fri, 25 Nov 2005, Steven M. Schweda wrote:

  Or, better yet, _DO_ forget to prepend the trouble-causing $CWD to those 
paths.


I agree. What good would prepending do? It will most definately add problems 
such as those Steven describes.


--
 -=- Daniel Stenberg -=- http://daniel.haxx.se -=-
  ech`echo xiun|tr nu oc|sed 'sx\([sx]\)\([xoi]\)xo un\2\1 is xg'`ol


Re: wget BUG: ftp file retrieval

2005-11-25 Thread Steven M. Schweda
From: Hrvoje Niksic

> Also don't [forget to] prepend the necessary [...] $CWD
> to those paths.

   Or, better yet, _DO_ forget to prepend the trouble-causing $CWD to
those paths.

   As you might recall from my changes for VMS FTP servers (if you had
ever looked at them), this scheme causes no end of trouble.  A typical
VMS FTP server reports the CWD in VMS form (for example,
"SYS$SYSDEVICE:[ANONYMOUS]").  It may be willing to use a UNIX-like path
in a CWD command (for example, "CWD A/B", but it's _not_ willing to use
a mix of them (for example, SYS$SYSDEVICE:[ANONYMOUS]/A/B").

   At a minimum, a separate CWD should be used to restore the initial
directory.  After that, you can do what you wish.  On my server at least
(HP TCPIP V5.4), "GET A/B/F.X" will work, but the mixed mess is unlikely
to work on any VMS FTP server.



   Steven M. Schweda   (+1) 651-699-9818
   382 South Warwick Street[EMAIL PROTECTED]
   Saint Paul  MN  55105-2547


Re: wget BUG: ftp file retrieval

2005-11-25 Thread Hrvoje Niksic
Hrvoje Niksic <[EMAIL PROTECTED]> writes:

> That might work.  Also don't prepend the necessary prepending of $CWD
> to those paths.

Oops, I meant "don't forget to prepend ...".


Re: wget BUG: ftp file retrieval

2005-11-25 Thread Hrvoje Niksic
Mauro Tortonesi <[EMAIL PROTECTED]> writes:

> Hrvoje Niksic wrote:
>> Arne Caspari <[EMAIL PROTECTED]> writes:
>>
>> I believe that CWD is mandated by the FTP specification, but you're
>> also right that Wget should try both variants.
>
> i agree. perhaps when retrieving file A/B/F.X we should try to use:
>
> GET A/B/F.X
>
> first, then:
>
> CWD A/B
> GET F.X
>
> if the previous attempt failed, and:
>
> CWD A
> CDW B
> GET F.X
>
> as a last resort. what do you think?

That might work.  Also don't prepend the necessary prepending of $CWD
to those paths.


Re: wget BUG: ftp file retrieval

2005-11-25 Thread Arne Caspari
Thank you all for your very fast response. As a further note: When this 
error occurs, wget bails out with the following error message:

"No such directory foo/bar".

I think it should instead be "Could not access foo/bar: Permission 
denied" or similar in such a situation.


/Arne


Mauro Tortonesi wrote:


Hrvoje Niksic wrote:


Arne Caspari <[EMAIL PROTECTED]> writes:

I believe that CWD is mandated by the FTP specification, but you're
also right that Wget should try both variants.



i agree. perhaps when retrieving file A/B/F.X we should try to use:

GET A/B/F.X

first, then:

CWD A/B
GET F.X

if the previous attempt failed, and:

CWD A
CDW B
GET F.X

as a last resort. what do you think?





Re: wget BUG: ftp file retrieval

2005-11-25 Thread Mauro Tortonesi

Hrvoje Niksic wrote:

Arne Caspari <[EMAIL PROTECTED]> writes:

I believe that CWD is mandated by the FTP specification, but you're
also right that Wget should try both variants.


i agree. perhaps when retrieving file A/B/F.X we should try to use:

GET A/B/F.X

first, then:

CWD A/B
GET F.X

if the previous attempt failed, and:

CWD A
CDW B
GET F.X

as a last resort. what do you think?

--
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi  http://www.tortonesi.com

University of Ferrara - Dept. of Eng.http://www.ing.unife.it
GNU Wget - HTTP/FTP file retrieval tool  http://www.gnu.org/software/wget
Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net
Ferrara Linux User Group http://www.ferrara.linux.it


Re: wget BUG: ftp file retrieval

2005-11-25 Thread Hrvoje Niksic
Arne Caspari <[EMAIL PROTECTED]> writes:

> When called like:
> wget user:[EMAIL PROTECTED]/foo/bar/file.tgz
>
> and foo or bar is a read/execute protected directory while file.tgz is
> user-readable, wget fails to retrieve the file because it tries to CWD
> into the directory first.
>
> I think the correct behaviour should be not to CWD into the
> directory but to issue a GET request with the full path instead (
> which will succeed ).

I believe that CWD is mandated by the FTP specification, but you're
also right that Wget should try both variants.  You can force Wget
into getting the file without CWD using this kludge:

wget ftp://user:[EMAIL PROTECTED]/%2Ffoo%2Fbar%2Ffile.tgz -O file.tgz


Re: wget bug report

2005-06-24 Thread Hrvoje Niksic
<[EMAIL PROTECTED]> writes:

> Sorry for the crosspost, but the wget Web site is a little confusing
> on the point of where to send bug reports/patches.

Sorry about that.  In this case, either address is fine, and we don't
mind the crosspost.

> After taking a look at it, i implemented the following change to
> http.c and tried again. It works for me, but i don't know what other
> implications my change might have.

It's exactly the correct change.  A similar fix has already been
integrated in the CVS (in fact subversion) code base.

Thanks for the report and the patch.


Re: Wget Bug

2005-04-26 Thread Hrvoje Niksic
Arndt Humpert <[EMAIL PROTECTED]> writes:

> wget, win32 rel. crashes with huge files.

Thanks for the report.  This problem has been fixed in the latest
version, available at http://xoomer.virgilio.it/hherold/ .


Re: WGET Bug?

2005-04-04 Thread Hrvoje Niksic
"Nijs, J. de" <[EMAIL PROTECTED]> writes:

> #
> C:\Grabtest\wget.exe -r --tries=3 http://www.xs4all.nl/~npo/ -o
> C:/Grabtest/Results/log
> #
> --16:23:02--  http://www.xs4all.nl/%7Enpo/
>   => `www.xs4all.nl/~npo/index.html'
> Resolving www.xs4all.nl... 194.109.6.92
> Connecting to www.xs4all.nl[194.109.6.92]:80... failed: No such file or
> directory.
> Retrying.
> #
>
> Is WGET always aspecting a INDEX.HTML al url file for grabbing data
> from the WWW ?

No, what you see is the result of two different things:

1. Wget uses "index.html" as the file name when one is missing from
   the URL because it ends with an empty path component.

2. Wget 1.9.1 (and previous versions) doesn't correctly display
   Winsock error messages, such as "connection refused".  The error
   message you're seeing doesn't reflect what really happened.

In this case, only issue #2 is a real bug.  It has been fixed in the
CVS version, which is unfortunately not yet available as a Windows
binary.


Re: wget bug: spaces in directories mapped to %20

2005-01-17 Thread Jochen Roderburg
Zitat von Tony O'Hagan <[EMAIL PROTECTED]>:

> Original path:  abc def/xyz pqr.gif
> After wget mirroring:   abc%20def/xyz pqr.gif   (broken link)
>
> wget --version  is GNU Wget 1.8.2
>

This was a "well-known error" in the 1.8 versions of wget, which is already
corrected in the 1.9 versions.

Regards,

Jochen Roderburg
ZAIK/RRZK
University of Cologne
Robert-Koch-Str. 10 Tel.:   +49-221/478-7024
D-50931 Koeln   E-Mail: [EMAIL PROTECTED]
Germany



Re: wget bug with ftp/passive

2004-08-12 Thread Jeff Connelly
On Wed, 21 Jan 2004 23:07:30 -0800, you wrote:
>Hello,
>I think I've come across a little bug in wget when using it to get a file
>via ftp.
>
>I did not specify the "passive" option, yet it appears to have been used
>anyway Here's a short transcript:
Passive FTP can be specified in /etc/wgetrc or /usr/local/etc/wgetrc, and then
its impossible to turn it off. There is no --active-mode flag as far
as I can tell.

I submitted a patch to wget-patches under the title of 
"Patch to add --active-ftp and make --passive-ftp default", which does
what it says.
Your configuration is setting passive mode to default, but the stock
wget defaults
to active (active mode doesn't work too well behind some firewalls).
--active-ftp is
a very useful option in these cases.

Last I checked, the patch hasn't been committed. I can't find the wget-patches
mail archives anywhere, either. So I'll paste it here, in hopes that it helps.

-Jeff Connelly

=cut here=
Common subdirectories: doc.orig/ChangeLog-branches and doc/ChangeLog-branches
diff -u doc.orig/wget.pod doc/wget.pod
--- doc.orig/wget.pod   Wed Jul 21 20:17:29 2004
+++ doc/wget.podWed Jul 21 20:18:56 2004
@@ -888,12 +888,17 @@
 system-specific.  This is why it currently works only with Unix FTP
 servers (and the ones emulating Unix C output).

+=item B<--active-ftp>
+
+Use the I FTP retrieval scehme, in which the server
+initiates the data connection. This is sometimes required to connect
+to FTP servers that are behind firewalls.

 =item B<--passive-ftp>

 Use the I FTP retrieval scheme, in which the client
 initiates the data connection.  This is sometimes required for FTP
-to work behind firewalls.
+to work behind firewalls, and as such is enabled by default.


 =item B<--retr-symlinks>
Common subdirectories: src.orig/.libs and src/.libs
Common subdirectories: src.orig/ChangeLog-branches and src/ChangeLog-branches
diff -u src.orig/init.c src/init.c
--- src.orig/init.c Wed Jul 21 20:17:33 2004
+++ src/init.c  Wed Jul 21 20:17:59 2004
@@ -255,6 +255,7 @@
   opt.ftp_glob = 1;
   opt.htmlify = 1;
   opt.http_keep_alive = 1;
+  opt.ftp_pasv = 1;
   opt.use_proxy = 1;
   tmp = getenv ("no_proxy");
   if (tmp)
diff -u src.orig/main.c src/main.c
--- src.orig/main.c Wed Jul 21 20:17:33 2004
+++ src/main.c  Wed Jul 21 20:17:59 2004
@@ -217,7 +217,8 @@
 FTP options:\n\
   -nr, --dont-remove-listing   don\'t remove `.listing\' files.\n\
   -g,  --glob=on/off   turn file name globbing on or off.\n\
-   --passive-ftp   use the \"passive\" transfer mode.\n\
+   --passive-ftp   use the \"passive\" transfer mode (default).\n\
+   --active-ftpuse the \"active\" transfer mode.\n\
--retr-symlinks when recursing, get linked-to files (not dirs).\
n\
 \n"), stdout);
   fputs (_("\
@@ -285,6 +286,7 @@
 { "no-parent", no_argument, NULL, 133 },
 { "non-verbose", no_argument, NULL, 146 },
 { "passive-ftp", no_argument, NULL, 139 },
+{ "active-ftp", no_argument, NULL, 167 },
 { "page-requisites", no_argument, NULL, 'p' },
 { "quiet", no_argument, NULL, 'q' },
 { "random-wait", no_argument, NULL, 165 },
@@ -397,6 +399,9 @@
case 139:
  setval ("passiveftp", "on");
  break;
+case 167:
+  setval ("passiveftp", "off");
+  break;
case 141:
  setval ("noclobber", "on");
  break;


Re: wget bug with ftp/passive

2004-01-22 Thread Hrvoje Niksic
don <[EMAIL PROTECTED]> writes:

> I did not specify the "passive" option, yet it appears to have been used
> anyway Here's a short transcript:
>
> [EMAIL PROTECTED] sim390]$ wget ftp://musicm.mcgill.ca/sim390/sim390dm.zip
> --21:05:21--  ftp://musicm.mcgill.ca/sim390/sim390dm.zip
>=> `sim390dm.zip'
> Resolving musicm.mcgill.ca... done.
> Connecting to musicm.mcgill.ca[132.206.120.4]:21... connected.
> Logging in as anonymous ... Logged in!
> ==> SYST ... done.==> PWD ... done.
> ==> TYPE I ... done.  ==> CWD /sim390 ... done.
> ==> PASV ...
> Cannot initiate PASV transfer.

Are you sure that something else hasn't done it for you?  For example,
a system-wide initialization file `/usr/local/etc/wgetrc' or
`/etc/wgetrc'.


Re: wget bug

2004-01-12 Thread Hrvoje Niksic
Kairos <[EMAIL PROTECTED]> writes:

> $ cat wget.exe.stackdump
[...]

What were you doing with Wget when it crashed?  Which version of Wget
are you running?  Was it compiled for Cygwin or natively for Windows?


Re: Wget Bug

2003-11-10 Thread Hrvoje Niksic
"Kempston" <[EMAIL PROTECTED]> writes:

> Yeah, i understabd that, but lftp hadles it fine even without
> specifying any additional option ;)

But then lftp is hammering servers when real unauthorized entry
occurs, no?

> I`m sure you can work something out

Well, I'm satisfied with what Wget does now.  :-)


Re: Wget Bug

2003-11-10 Thread Hrvoje Niksic
The problem is that the server replies with "login incorrect", which
normally means that authorization has failed and that further retries
would be pointless.  Other than having a natural language parser
built-in, Wget cannot know that the authorization is in fact correct,
but that the server happens to be busy.

Maybe Wget should have an option to retry even in the case of (what
looks like) a login incorrect FTP response.


Re: wget bug

2003-09-26 Thread Hrvoje Niksic
Jack Pavlovsky <[EMAIL PROTECTED]> writes:

> It's probably a bug: bug: when downloading wget -mirror
> ftp://somehost.org/somepath/3acv14~anivcd.mpg, wget saves it as-is,
> but when downloading wget ftp://somehost.org/somepath/3*, wget saves
> the files as 3acv14%7Eanivcd.mpg

Thanks for the report.  The problem here is that Wget tries to be
"helpful" by encoding unsafe characters in file names to %XX, as is
done in URLs.  Your first example works because of an oversight (!) 
that actually made Wget behave as you expected.

The good news is that the "helpfulness" has been rethought for the
next release and is no longer there, at least not for ordinary
characters like "~" and " ".  Try getting the latest CVS sources, they
should work better in this regard.  (http://wget.sunsite.dk/ explains
how to download the source from CVS.)


Re: wget bug

2003-09-26 Thread DervishD
Hi Jack :)

 * Jack Pavlovsky <[EMAIL PROTECTED]> dixit:
> It's probably a bug:
> bug: when downloading 
> wget -mirror ftp://somehost.org/somepath/3acv14~anivcd.mpg, 
>  wget saves it as-is, but when downloading
> wget ftp://somehost.org/somepath/3*, wget saves the files as 
> 3acv14%7Eanivcd.mpg

Yes, it *was* a bug. The lastest prerelease has it fixed. Don't
know if the tarball has the latest patches, ask Hvroje. But if you
are not in a hurry, just wait for 1.9 to be released.

> The human knowledge belongs to the world

True ;))

Raúl Núñez de Arenas Coronado

-- 
Linux Registered User 88736
http://www.pleyades.net & http://raul.pleyades.net/


Re: wget bug: mirror doesn't delete files deleted at the source

2003-08-01 Thread Aaron S. Hawley
On Fri, 1 Aug 2003, Mordechai T. Abzug wrote:

> I'd like to use wget in mirror mode, but I notice that it doesn't
> delete files that have been deleted at the source site.  Ie.:
>
>   First run: the source site contains "foo" and "bar", so the mirror now
>   contains "foo" and "bar".
>
>   Before second run: the source site deletes "bar" and replaces it with
>   "ook", and the mirror is run again.
>
>   After second run: the mirror now contains "foo", "bar", and "ook".
>
> This is not usually the way that mirrors work; wget should delete
> "bar" if it's not at the site.

i don't disagree on your definition of "mirrors", but in Unix (and in GNU)
its usually customary not to delete files without user permission.

http://www.google.com/search?q=wget+archives+delete+mirror+site%3Ageocrawler.com


Re: wget bug

2002-11-05 Thread Jeremy Hetzler
At 09:20 AM 11/5/2002 -0700, Jing Ping Ye wrote:

Dear Sir:
I tried to use "wget" download data from ftp site but got error message as 
following:
> 
wget 
ftp://ftp.ngdc.noaa.gov/pub/incoming/RGON/anc_1m.OCT 

Screen show:
-- 

--09:02:40-- 
ftp://ftp.ngdc.noaa.gov/pub/incoming/RGON/anc_1m.OCT 

   => `anc_1m.OCT'
Resolving ftp.ngdc.noaa.gov... done.
Connecting to ftp.ngdc.noaa.gov[140.172.180.164]:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.==> PWD ... done.
==> TYPE I ... done.  ==> CWD /pub/incoming/RGON ... done.
==> PORT ... done.==> RETR anc_1m.OCT ...
Error in server response, closing control connection.
Retrying.

Using the -d switch reveals that the server refuses to send the file due to 
insufficient permissions:

200 PORT command successful.
done.==> RETR anc_1m.OCT ...
--> RETR anc_1m.OCT

550 anc_1m.OCT: Permission denied.

No such file `anc_1m.OCT'.



But when I use ftp  ( ftp ftp.ngdc.noaa.gov), I can get data.


False.


$ ftp
ftp> open ftp.ngdc.noaa.gov
Connected to ftp.ngdc.noaa.gov.
220 apex FTP server (Version wu-2.6.1(1) Thu Nov 29 13:24:22 MST 2001) ready.
Name (ftp.ngdc.noaa.gov:**): anonymous
331 Guest login ok, send your complete e-mail address as password.
Password:
230-Please read the file README.txt
230-  it was last modified on Thu Jan  6 07:55:46 2000 - 1033 days ago
230 Guest login ok, access restrictions apply.
Remote system type is UNIX.
Using binary mode to transfer files.
ftp> cd /pub/incoming/RGON
250 CWD command successful.
ftp> pwd
257 "/pub/incoming/RGON" is current directory.
ftp> get anc_1m.OCT
200 PORT command successful.
550 anc_1m.OCT: Permission denied.


This is not a bug in wget.






Re: wget bug (overflow)

2002-04-15 Thread Hrvoje Niksic

I'm afraid that downloading files larger than 2G is not supported by
Wget at the moment.



Re: wget bug?!

2002-02-19 Thread TD - Sales International Holland B.V.

On Monday 18 February 2002 17:52, you wrote:

That would be great. The prob is that I'm using it to retrieve files mostly 
on servers that are having too much users. No I don't want to hammer the 
server but I do want to keep on trying with reasonable intervals until I get 
the file.

I think the feature would be usuable in other scenarios as well. You now have 
--waitretry and --wait, in my personal opinion the best would perhaps be to 
add --waitint(er)(val) or perhaps just --int(er)(val)

Anyways, thanks for the reply.

Kind regards,

Ferry van Steen

> [The message I'm replying to was sent to <[EMAIL PROTECTED]>. I'm
> continuing the thread on <[EMAIL PROTECTED]> as there is no bug and
> I'm turning it into a discussion about features.]
>
> On 18 Feb 2002 at 15:14, TD - Sales International Holland B.V. wrote:
> > I've tried -w 30
> > --waitretry=30
> > --wait=30 (I think this one is for multiple files and the time in between
> > those though)
> >
> > None of these seem to make wget wanna wait for 30 secs before trying
> > again. Like this I'm hammering the server.
>
> The --waitretry option will wait for 1 second for the first retry,
> then 2 seconds, 3 seconds, etc. up to the value specified. So you
> may consider the first few retry attempts to be hammering the
> server but it will gradually back off.
>
> It sounds like you want an option to specify the initial retry
> interval (currently fixed at 1 second), but Wget currently has no
> such option, nor an option to change the amount it increments by
> for each retry attempt (also currently fixed at 1 second).
>
> If such features were to be added, perhaps it could work something
> like this:
>
> --waitretry=n - same as --waitretry=n,1,1
> --waitretry=n,m   - same as --waitretry=n,m,1
> --waitretry=n,m,i - wait m seconds for the first retry,
> incrementing by i seconds for subsequent
> retries up to a maximum of n seconds
>
> The disadvantage of doing it that way is that no-one will remember
> which order the numbers should appear, so an alternative is to
> leave --waitretry alone and supplement it with --waitretryfirst
> and --waitretryincr options.



Re: wget bug?!

2002-02-18 Thread Ian Abbott

[The message I'm replying to was sent to <[EMAIL PROTECTED]>. I'm
continuing the thread on <[EMAIL PROTECTED]> as there is no bug and
I'm turning it into a discussion about features.]

On 18 Feb 2002 at 15:14, TD - Sales International Holland B.V. wrote:

> I've tried -w 30
> --waitretry=30
> --wait=30 (I think this one is for multiple files and the time in between 
> those though)
> 
> None of these seem to make wget wanna wait for 30 secs before trying again. 
> Like this I'm hammering the server.

The --waitretry option will wait for 1 second for the first retry,
then 2 seconds, 3 seconds, etc. up to the value specified. So you
may consider the first few retry attempts to be hammering the
server but it will gradually back off.

It sounds like you want an option to specify the initial retry
interval (currently fixed at 1 second), but Wget currently has no
such option, nor an option to change the amount it increments by
for each retry attempt (also currently fixed at 1 second).

If such features were to be added, perhaps it could work something
like this:

--waitretry=n - same as --waitretry=n,1,1
--waitretry=n,m   - same as --waitretry=n,m,1
--waitretry=n,m,i - wait m seconds for the first retry,
incrementing by i seconds for subsequent
retries up to a maximum of n seconds

The disadvantage of doing it that way is that no-one will remember
which order the numbers should appear, so an alternative is to
leave --waitretry alone and supplement it with --waitretryfirst
and --waitretryincr options.



Re: [Wget]: Bug submission

2001-12-29 Thread Hrvoje Niksic

[ Please mail bug reports to <[EMAIL PROTECTED]>, not to me directly. ]

Nuno Ponte <[EMAIL PROTECTED]> writes:

> I get a segmentation fault when invoking:
> 
> wget -r
> http://java.sun.com/docs/books/performance/1st_edition/html/JPTOC.fm.html
> 
> My Wget version is 1.7-3, the one which is bundled with RedHat
> 7.2. I attached my .wgetrc.

Wget 1.7 is fairly old -- it was followed by a bugfix 1.7.1 release,
and then 1.8 and 1.8.1.  Please try upgrading to the latest version,
1.8.1, and see if the bug repeats.  I couldn't repeat it with 1.8.1.



Re: wget bug (?)

2001-11-15 Thread Ian Abbott

On 14 Nov 2001, at 13:20, Bernard, Shawn wrote:

> I'm not sure if this is a bug or not, but when I ran this line: 
> wget -r -l2 http://www.turnerclassicmovies.com/NowPlaying/Index 
> I get this result: 
(snip)
> `www.turnerclassicmovies.com/Home/Index/0,3436,,00.html' saved [27179] 
> 
> Segmentation Fault(coredump) 

That one is fixed in the CVS repository (not that the CVS 
repository has been maintained for a few months, but that's another 
matter).

As a workaround in wget 1.7, you could try using the option
'-Gmeta', as this bug usually occurs while processing large META 
tags. The '-Gmeta' option causes wget to ignore META tags.




Re: wget bug?

2001-06-14 Thread Jan Prikryl

"Story, Ian" wrote:

> > I have been a very happy user of wget for a long time.  However, today I
> > noticed that some sites, that don't run on port 80, don't work well with
> > wget.  For instance, when I tell wget to go get http://www.yahoo.com, it
> > automatically puts :80 at the end, like this: http://www.yahoo.com:80.
> > That is fine, most of the time, but some sites won't like that, and in
> > fact, will give a 404 error, or other errors.  So, I consulted the
> > documentation, but couldn't find a way around this...is there a
> > fix/workaround/something in the manual that I didn't see or understand to
> > get around this?  I tried a few web searches, and didn't find much
> > information...

Which version of wget do you use? I guess the last version that had this
proble was wget 1.5.3. Try 1.6 or 1.7 (but be warned that the
compilation support of SSL in 1.7 is often not working properly yet).

More information about wget may be found at http://sunsite.dk/wget/

-- jan



Re: wget bug - after closing control connection

2001-03-08 Thread csaba . raduly


Which version of wget do you use ? Are you aware that wget 1.6 has been
released and 1.7 is in development (and they contain a workaround for the
"Lying FTP server syndrome" you are seeing) ?
--
Csaba Ráduly, Software Engineer  Sophos Anti-Virus
email: [EMAIL PROTECTED]   http://www.sophos.com
US support: +1 888 SOPHOS 9UK Support: +44 1235 559933






Re: wget bug with following to a new location

2001-01-23 Thread Hack Kampbjørn

Volker Kuhlmann wrote:
> 
> I came across this bug in wget where it gives an error instead of
> following, as it should.
> 
> Volker
> 
> > wget --version
> GNU Wget 1.5.3
Hmm that's quite old ...
> 
> Copyright (C) 1995, 1996, 1997, 1998 Free Software Foundation, Inc.
> This program is distributed in the hope that it will be useful,
> but WITHOUT ANY WARRANTY; without even the implied warranty of
> MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> GNU General Public License for more details.
> 
> Written by Hrvoje Niksic <[EMAIL PROTECTED]>.
> 
> > cat ./Wns
> #!/bin/sh
> #
> exec wget -Y0 --proxy-pass=none --cache=off \
> -U 'Mozilla/4.7 [en] (WinNT; I)' \
> "$@"
-U that was in 1.5.3 release so you're using some patches ...

> 
> > ./Wns http://www.themestream.com/articles/298752.html -S
> --08:52:39--  http://www.themestream.com:80/articles/298752.html
>=> `www.themestream.com/articles/298752.html'
> Connecting to www.themestream.com:80... connected!
> HTTP request sent, awaiting response... 302 Found
> 2 Connection: close
> 3 Date: Sun, 21 Jan 2001 19:52:05 GMT
> 4 Server: Apache/1.3.12 (Unix)  (Red Hat/Linux)
> 5 Cache-Control: no-cache
> 6 Content-Type: text/html; charset=ISO-8859-1
> 7 Expires: 0
> 8 Location: /gspd_browse/browse/view_article.gsp?c_id=298752&id_list=&cookied=T

Aha no absolute location, this is a common "bug" in so many web-sites
8-(

> 9 Pragma: no-cache
> 10 Set-Cookie: g-id=pfkocnholppacgapgkll.10024743; expires=Fri, 01-Jan-2010 20:00:
> 00 GMT; path=/
> 11 Set-Cookie: g-id=fnfafclcgnjdmpaccfda.10024748; expires=Fri, 01-Jan-2010 20:00:
> 00 GMT; path=/
> 12 Content-Length: 0
> 13
> Location: /gspd_browse/browse/view_article.gsp?c_id=298752&id_list=&cookied=T [fol
> lowing]
> /gspd_browse/browse/view_article.gsp?c_id=298752&id_list=&cookied=T: Unknown/unsup
> ported protocol.
> Exit 1

Of course unknown protocol ...

Now seriosly: This "bug" has been fixed in release 1.6 look at the
web-site http://sunsite.dk/wget or on a GNU mirror near you (tm)

> 
> > ./Wns http://www.themestream.com/'gspd_browse/browse/view_article.gsp?c_id=29875
> 2&id_list=&cookied=T' -S
> --08:53:51--  http://www.themestream.com:80/gspd_browse/browse/view_article.gsp?c_
> id=298752&id_list=&cookied=T
>=> `www.themestream.com/gspd_browse/browse/view_article.gsp?c_id=298752
> &id_list=&cookied=T'
> Connecting to www.themestream.com:80... connected!
> HTTP request sent, awaiting response... 200 OK
> 2 Connection: close
> 3 Date: Sun, 21 Jan 2001 19:54:03 GMT
> 4 Server: Apache/1.3.12 (Unix)  (Red Hat/Linux) PHP/3.0.15 mod_perl/1.21
> 5 Cache-Control: no-cache
> 6 Content-Type: text/html; charset=ISO-8859-1
> 7 Expires: 0
> 8 Pragma: no-cache
> 9 Set-Cookie: g-id=lacchjobhehcipblkmbg.10024818; expires=Fri, 01-Jan-2010 20:00:0
> 0 GMT; path=/
> 10 Content-Length: 21735
> 11
> 
> 0K -> .. .. .[100%]
> 
> Last-modified header missing -- time-stamps turned off.
> 08:53:56 (7.52 KB/s) - `www.themestream.com/gspd_browse/browse/view_article.gsp?c_
> id=298752&id_list=&cookied=T' saved [21735/21735]

-- 
Med venlig hilsen / Kind regards

Hack Kampbjørn   [EMAIL PROTECTED]
HackLine +45 2031 7799