How works wget in Detail on HTTP ? Binary file

2003-08-18 Thread marvin.ammet
Hi List,

How work the wget Programm in Detail (getting a binary File)?

I want write a minimalistic DOS Batch or VB-Script
to download some *.gif Files (periodical) to an
local Filesystem Folder.
Can i use telnet 80 www.mysite.com get

for gettig the Input an writing it into
an File with echo bla  myfile.gif ?
Thanks

Marv



wget is mirroring whole internet instead of just my web page!

2003-08-18 Thread Andrzej Kasperowicz
When I try to mirror web pages using the command:
wget -m -nv -k -K -nH -t 100 -o logchemfanpl -P public_html/mirror 
http://znik.wbc.lublin.pl/ChemFan/

wget is mirroring not just the domain of the web page but just whole 
internet...

There is robot.txt files, but it should not influence wget to 
download all available domains I suppose?

So why is it happening and how to avoid it?

Regards
Andrzej.


unreasonable not to doc ascii vs. binary in the --help text

2003-08-18 Thread Mark David
When I look at the long help for wget, there's no mention of how to arrange
for ascii vs. binary download. It should be under FTP options.  I've used
FTP for almost 20 years, and the ASCII or BINARY commands are the two most
common commands outside of get and put.  I think it's pretty unreasonable
not to mention binary or ascii it in the --help output at all.  For
simple file downloads on the command line, which is when you need this help
desk to help you, the way to specify binary is crucial.

For your reference, the full help text ( version info) follows the end of
this message.

Thanks for your consideration,

Mark David

wget --help
GNU Wget 1.8.2, a non-interactive network retriever.
Usage: wget [OPTION]... [URL]...

Mandatory arguments to long options are mandatory for short options too.

Startup:
  -V,  --version   display the version of Wget and exit.
  -h,  --help  print this help.
  -b,  --backgroundgo to background after startup.
  -e,  --execute=COMMAND   execute a `.wgetrc'-style command.

Logging and input file:
  -o,  --output-file=FILE log messages to FILE.
  -a,  --append-output=FILE   append messages to FILE.
  -d,  --debugprint debug output.
  -q,  --quietquiet (no output).
  -v,  --verbose  be verbose (this is the default).
  -nv, --non-verbose  turn off verboseness, without being quiet.
  -i,  --input-file=FILE  download URLs found in FILE.
  -F,  --force-html   treat input file as HTML.
  -B,  --base=URL prepends URL to relative links in -F -i file.
   --sslcertfile=FILE optional client certificate.
   --sslcertkey=KEYFILE   optional keyfile for this certificate.
   --egd-file=FILEfile name of the EGD socket.

Download:
   --bind-address=ADDRESS   bind to ADDRESS (hostname or IP) on local
host.
  -t,  --tries=NUMBER   set number of retries to NUMBER (0
unlimits).
  -O   --output-document=FILE   write documents to FILE.
  -nc, --no-clobber don't clobber existing files or use .#
suffixes.
  -c,  --continue   resume getting a partially-downloaded file.
   --progress=TYPE  select progress gauge type.
  -N,  --timestamping   don't re-retrieve files unless newer than
local.
  -S,  --server-responseprint server response.
   --spider don't download anything.
  -T,  --timeout=SECONDSset the read timeout to SECONDS.
  -w,  --wait=SECONDS   wait SECONDS between retrievals.
   --waitretry=SECONDS  wait 1...SECONDS between retries of a
retrieval.
   --random-waitwait from 0...2*WAIT secs between
retrievals.
  -Y,  --proxy=on/off   turn proxy on or off.
  -Q,  --quota=NUMBER   set retrieval quota to NUMBER.
   --limit-rate=RATElimit download rate to RATE.

Directories:
  -nd  --no-directoriesdon't create directories.
  -x,  --force-directories force creation of directories.
  -nH, --no-host-directories   don't create host directories.
  -P,  --directory-prefix=PREFIX   save files to PREFIX/...
   --cut-dirs=NUMBER   ignore NUMBER remote directory
components.

HTTP options:
   --http-user=USER  set http user to USER.
   --http-passwd=PASSset http password to PASS.
  -C,  --cache=on/off(dis)allow server-cached data (normally
allowed).
  -E,  --html-extension  save all text/html documents with .html
extension.
   --ignore-length   ignore `Content-Length' header field.
   --header=STRING   insert STRING among the headers.
   --proxy-user=USER set USER as proxy username.
   --proxy-passwd=PASS   set PASS as proxy password.
   --referer=URL include `Referer: URL' header in HTTP request.
  -s,  --save-headerssave the HTTP headers to file.
  -U,  --user-agent=AGENTidentify as AGENT instead of Wget/VERSION.
   --no-http-keep-alive  disable HTTP keep-alive (persistent
connections).
   --cookies=off don't use cookies.
   --load-cookies=FILE   load cookies from FILE before session.
   --save-cookies=FILE   save cookies to FILE after session.

FTP options:
  -nr, --dont-remove-listing   don't remove `.listing' files.
  -g,  --glob=on/off   turn file name globbing on or off.
   --passive-ftp   use the passive transfer mode.
   --retr-symlinks when recursing, get linked-to files (not
dirs).

Recursive retrieval:
  -r,  --recursive  recursive web-suck -- use with care!
  -l,  --level=NUMBER   maximum recursion depth (inf or 0 for infinite).
   --delete-after   delete files locally after downloading them.
  -k,  --convert-links  convert non-relative links to relative.
  -K,  --backup-converted   before converting file X, back up as X.orig.
  -m,  --mirror shortcut option equivalent to -r -N -l inf -nr.
  -p,  --page-requisites

RE: unreasonable not to doc ascii vs. binary in the --help text

2003-08-18 Thread Mark David
You said: The type selection is rarely needed ...

This is untrue. I just tried this out using wget on Windows.

If you don't tack on ;type=a onto the end when transfering a text
file from unix to Windows, the file's line endings will not be
converted from unix (LF) to Windows (CRLF) conventions.

If you look at the file in applications that just follow windows
conventions, e.g., Notepad, the lines will not be broken in the
display.  Some applications (e.g., web browsers) follow a 
liberal interpretation of line endings, which helps overcome 
this problem, but many do not, including programs that read
ascii (text) files as data, and will silently but fatally malfunction
if the CR is not there in front of the LF.

So, with unix to Windows transfer of text files being obviously
an extremely common case, this clearly deserves a few lines in
your --help documentation.  It can hardly violate any length
limit for that text -- there seems to be none, since the
text goes on and on and documents such seldom needed options 
as passive mode:

  --passive-ftp   use the passive transfer mode.

And many others that don't deserve as much attention as ascii
vs. binary transfer.

Thanks,

Mark


-Original Message-
From: Maciej W. Rozycki [mailto:[EMAIL PROTECTED]
Sent: Mon, August 18, 2003 11:22 AM
To: Mark David
Cc: '[EMAIL PROTECTED]'
Subject: Re: unreasonable not to doc ascii vs. binary in the --help text


On Mon, 18 Aug 2003, Mark David wrote:

 When I look at the long help for wget, there's no mention of how to
arrange
 for ascii vs. binary download. It should be under FTP options.  I've used
 FTP for almost 20 years, and the ASCII or BINARY commands are the two most
 common commands outside of get and put.  I think it's pretty unreasonable
 not to mention binary or ascii it in the --help output at all.  For
 simple file downloads on the command line, which is when you need this
help
 desk to help you, the way to specify binary is crucial.

 The default download type wget uses is binary.  If you want another type,
then ;type=X (X denotes the desired type; e.g. i is binary and a
is ASCII) can be appended to a URL.  It's all documented within the wget's
info pages.  The type selection is rarely needed -- typically for
downloading a text file from an EBCDIC host -- so including it with the
short help reference would seem to be an overkill. 

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--+
+e-mail: [EMAIL PROTECTED], PGP key available+


RE: wget is mirroring whole internet instead of just my web page!

2003-08-18 Thread Post, Mark K
man wget shows:
   -D domain-list
   --domains=domain-list
   Set domains to be followed.  domain-list is a comma-separated
list of domains.
   Note that it does not turn on -H.


Mark Post

-Original Message-
From: Andrzej Kasperowicz [mailto:[EMAIL PROTECTED]
Sent: Monday, August 18, 2003 8:38 AM
To: [EMAIL PROTECTED]
Subject: wget is mirroring whole internet instead of just my web page!


When I try to mirror web pages using the command:
wget -m -nv -k -K -nH -t 100 -o logchemfanpl -P public_html/mirror 
http://znik.wbc.lublin.pl/ChemFan/

wget is mirroring not just the domain of the web page but just whole 
internet...

There is robot.txt files, but it should not influence wget to 
download all available domains I suppose?

So why is it happening and how to avoid it?

Regards
Andrzej.


RE: wget is mirroring whole internet instead of just my web page!

2003-08-18 Thread Andrzej Kasperowicz
On 18 Aug 2003 at 13:49, Post, Mark K wrote:

 man wget shows:
-D domain-list
--domains=domain-list
Set domains to be followed.  domain-list is a comma-separated
 list of domains.
Note that it does not turn on -H.

Right, but by default wget should not follow all domains, 
then why it was happening in this case?

I tried also to mirror another web site from the same server, 
also containing links to other domains:
wget -m -nv -k -K -nH -t 100 -o logmineraly -P public_html/mirror 
http://znik.wbc.lublin.pl/Mineraly/

and in this case it was not downloading from other domains.
So that's a mystery really.

Anyway, if I add -D wbc.lublin.pl it should run correctly?
wget -m -nv -k -K -nH -t 100 -D wbc.lublin.pl -o logchemfanpl -P 
public_html/mirror http://znik.wbc.lublin.pl/ChemFan/

ak


RE: wget is mirroring whole internet instead of just my web page!

2003-08-18 Thread Post, Mark K
It's always been my experience when specifying -m that wget does follow
across domains by default.  I've always had to tell it not to do that.

Mark Post

-Original Message-
From: Andrzej Kasperowicz [mailto:[EMAIL PROTECTED]
Sent: Monday, August 18, 2003 4:02 PM
To: Post, Mark K; [EMAIL PROTECTED]
Subject: RE: wget is mirroring whole internet instead of just my web
page!


On 18 Aug 2003 at 13:49, Post, Mark K wrote:

 man wget shows:
-D domain-list
--domains=domain-list
Set domains to be followed.  domain-list is a comma-separated
 list of domains.
Note that it does not turn on -H.

Right, but by default wget should not follow all domains, 
then why it was happening in this case?

I tried also to mirror another web site from the same server, 
also containing links to other domains:
wget -m -nv -k -K -nH -t 100 -o logmineraly -P public_html/mirror 
http://znik.wbc.lublin.pl/Mineraly/

and in this case it was not downloading from other domains.
So that's a mystery really.

Anyway, if I add -D wbc.lublin.pl it should run correctly?
wget -m -nv -k -K -nH -t 100 -D wbc.lublin.pl -o logchemfanpl -P 
public_html/mirror http://znik.wbc.lublin.pl/ChemFan/

ak


Re: unreasonable not to doc ascii vs. binary in the --help text

2003-08-18 Thread Maciej W. Rozycki
On Mon, 18 Aug 2003, Mark David wrote:

 When I look at the long help for wget, there's no mention of how to arrange
 for ascii vs. binary download. It should be under FTP options.  I've used
 FTP for almost 20 years, and the ASCII or BINARY commands are the two most
 common commands outside of get and put.  I think it's pretty unreasonable
 not to mention binary or ascii it in the --help output at all.  For
 simple file downloads on the command line, which is when you need this help
 desk to help you, the way to specify binary is crucial.

 The default download type wget uses is binary.  If you want another type,
then ;type=X (X denotes the desired type; e.g. i is binary and a
is ASCII) can be appended to a URL.  It's all documented within the wget's
info pages.  The type selection is rarely needed -- typically for
downloading a text file from an EBCDIC host -- so including it with the
short help reference would seem to be an overkill. 

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--+
+e-mail: [EMAIL PROTECTED], PGP key available+