AW: Problem mirroring a site using ftp over proxy

2008-08-07 Thread Juon, Stefan
...problem exists also with version 1.11.4. So what might cause wget not
to download the files as it has performed a LIST?

Thanks, Stefan

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Juon, Stefan wrote:
 Hi there
 I'm trying to mirror a ftp site over a proxy (Sun Java Webproxy 4.0.4)

 using this wget-command:
  
 export ftp_proxy=http://proxy.company.com:8080
 wget --follow-ftp --passive-ftp --proxy=on --mirror 
 --output-file=./logfile.wget ftp://ftpde.nai.com/CommonUpdater

What version of Wget are you running? If it's not the latest, please try
the current 1.11.4 release.

Please also try the --debug option, to see if Wget gives you more
information.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFImVZ77M8hyUobTrERAgS7AJ4lWgDuBJonnms+gkriGTZ7LlA4TwCfeNqo
jOtcPq60sVWXb9CA1n6FSnI=
=Z/D4
-END PGP SIGNATURE-


Connection management and pipelined Wget

2008-08-07 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Micah Cowan wrote:
 * A getter command is mentioned more than once in the above. Note that
 this is not mutually exclusive with the concept of letting a single
 process govern connection persistence, which would handle the real work;
 the getter would probaby be a tool for communicating with the main driver.

...

 - Using existing tools to implement protocols Wget doesn't understand
 (want scp support? Just register it as an scp:// scheme handler), and
 instantly add support to Wget for the latest, greatest protocols without
 hacking Wget or waiting until we get around to implementing it.

Of course, one drawback is that it then becomes difficult to sanely
handle a feature for multiple simultaneous connections, or even
persistent connections, when outside programs come into play. Using a
getter we have control over, that can communicate with a
connection-managing program, would allow this to work, but that won't
work with outside programs that aren't in the know, such as the scp
command, or other getter programs. You can fork multiple scps for
multiple connections, but what will keep the number of simultaneous
connections to a reasonable limit?

Plus, even the idea of our own getter program communicating via a Unix
socket or some such to a connections manager program, irks me: it
obliterates the independence that makes pipelines useful. I guess, to be
useful, a pipelined Wget would need to have wholly independent tools;
but the loss of persistent connections would be too great a loss to
bear, I think (not that Wget handles them particularly well now:
HTTP/1.1 should significantly improve it, though).

Still, there were already plans to allow arbitrary content handler
commands, and URL filters; we can certainly continue to move in that
direction. We could still split off the HTML and CSS parsers as
completely autonomous (and interchangeable with alternatives) programs.
But it seems to me that content-_fetching_ (protocol support) will need
to continue to be fully integrated in Wget's core. Decisions on whether
URLs are followed or not could also be outsourced.

Previously, I said that we might lose Windows support by making Wget
more pipeline-y; but that's not necessarily true. It's just harder to
implement in Windows, but can be done. Hell, if need be, we could have
Wget write input to a file, then have the parser read it and spit out
another file. That's obviously lame, but OTOH it's how Wget already
parses HTML currently (except that no additional programs are used). I
suspect, though, that such a program would see a Unix-oriented release
some time before the Windows port would appear; unless there were
ongoing collaboration on a Windows port simultaneous to the Unix-ish
development.

If in fact everything except for connections could be handled as an
external command, then there might be little advantage to be gained by
library-izing Wget, and it might make more sense to leaving Wget as a
program, and letting connection handlers be plugins (which are expected
to use Wget's connection management system, rather than direct connections).

Such a project should still probably get a new name (I was going to say
be a fork, but it'd probably be a rearchitecture anyway, with little
in common to current Wget); Wget proper should continue to be a project
that appeals to folks that need a tool that's sufficiently lightweight
to install as a core system component, without a lot of fluff (or at
least, not too much more fluff than it already has).

BTW, I added a couple new name concepts to
http://wget.addictivecode.org/Wget2Names: xget (x being the letter
after w), and niwt (which I like best so far: Nifty Integrated Web Tools).

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIm2TG7M8hyUobTrERArRCAJwLkozlzfxEDJcJWBQDiHun6KoMfACeMI61
m7NvCrQ7XAIHTuW7Y9+6wCg=
=yeUz
-END PGP SIGNATURE-


Re: Connection management and pipelined Wget

2008-08-07 Thread Daniel Stenberg

On Thu, 7 Aug 2008, Micah Cowan wrote:


niwt (which I like best so far: Nifty Integrated Web Tools).


But the grand question is: how would that be pronounced? Like newt? :-)

--

 / daniel.haxx.se


Re: Connection management and pipelined Wget

2008-08-07 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Daniel Stenberg wrote:
 On Thu, 7 Aug 2008, Micah Cowan wrote:
 
 niwt (which I like best so far: Nifty Integrated Web Tools).
 
 But the grand question is: how would that be pronounced? Like newt? :-)

That was my thinking :)

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIm2cl7M8hyUobTrERAt33AJ4xEts7QxviDOjRx7L83fr6QkFwrwCbBXy5
MgYGOL0OJRsg5+IpPEI0djY=
=dzkE
-END PGP SIGNATURE-


Re: AW: Problem mirroring a site using ftp over proxy

2008-08-07 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Well, considering that FTP proxied over HTTP is working fine for me,
it's probably more a matter of the index.html file that's generated by
the proxy (since one can't do a true LIST over a proxy). Perhaps you
could supply the index.html files that are being generated (be sure to
clean out any sensitive info first).

It might also be informative to know what server program is doing the
proxying.

- -Micah

Juon, Stefan wrote:
 ...problem exists also with version 1.11.4. So what might cause wget not
 to download the files as it has performed a LIST?
 
 Thanks, Stefan
 
 Juon, Stefan wrote:
 Hi there
 I'm trying to mirror a ftp site over a proxy (Sun Java Webproxy 4.0.4)
 
 using this wget-command:
 
 export ftp_proxy=http://proxy.company.com:8080
 wget --follow-ftp --passive-ftp --proxy=on --mirror 
 --output-file=./logfile.wget ftp://ftpde.nai.com/CommonUpdater
 
 What version of Wget are you running? If it's not the latest, please try
 the current 1.11.4 release.
 
 Please also try the --debug option, to see if Wget gives you more
 information.
 

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIm2fF7M8hyUobTrERAv/BAJ9biwIIUFaIWZ9Ds7IZxiGAKriA7wCeJtn1
lYdaP8hzodianPg1Bp6b6gk=
=+HQo
-END PGP SIGNATURE-


Re: WGET Date-Time

2008-08-07 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Andreas Weller wrote:
 Hi!
 I use wget to download files from a ftp server in a bash script.
 For example:
 touch last.time
 wget -nc ftp://[]/*.txt .
 find -newer last.time
 
 This fails if the files on the FTP server are older than my last.time. So I 
 want
 wget to set file date/time to the local creation time not the server's...
 
 How to do this?

You can't, currently. This behavior is intended to support Wget's
timestamping (-N) functionality.

However, I'd accept a patch for an option that disables this.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIm2si7M8hyUobTrERAi9AAJ0f8TUv7TJR6tFsgc4k174rqH6OlgCghCzz
xpemaFdQhODIm0SGp7rJSRA=
=vDKD
-END PGP SIGNATURE-