How works wget in Detail on HTTP ? Binary file
Hi List, How work the wget Programm in Detail (getting a binary File)? I want write a minimalistic DOS Batch or VB-Script to download some *.gif Files (periodical) to an local Filesystem Folder. Can i use telnet 80 www.mysite.com get for gettig the Input an writing it into an File with echo bla myfile.gif ? Thanks Marv
wget is mirroring whole internet instead of just my web page!
When I try to mirror web pages using the command: wget -m -nv -k -K -nH -t 100 -o logchemfanpl -P public_html/mirror http://znik.wbc.lublin.pl/ChemFan/ wget is mirroring not just the domain of the web page but just whole internet... There is robot.txt files, but it should not influence wget to download all available domains I suppose? So why is it happening and how to avoid it? Regards Andrzej.
unreasonable not to doc ascii vs. binary in the --help text
When I look at the long help for wget, there's no mention of how to arrange for ascii vs. binary download. It should be under FTP options. I've used FTP for almost 20 years, and the ASCII or BINARY commands are the two most common commands outside of get and put. I think it's pretty unreasonable not to mention binary or ascii it in the --help output at all. For simple file downloads on the command line, which is when you need this help desk to help you, the way to specify binary is crucial. For your reference, the full help text ( version info) follows the end of this message. Thanks for your consideration, Mark David wget --help GNU Wget 1.8.2, a non-interactive network retriever. Usage: wget [OPTION]... [URL]... Mandatory arguments to long options are mandatory for short options too. Startup: -V, --version display the version of Wget and exit. -h, --help print this help. -b, --backgroundgo to background after startup. -e, --execute=COMMAND execute a `.wgetrc'-style command. Logging and input file: -o, --output-file=FILE log messages to FILE. -a, --append-output=FILE append messages to FILE. -d, --debugprint debug output. -q, --quietquiet (no output). -v, --verbose be verbose (this is the default). -nv, --non-verbose turn off verboseness, without being quiet. -i, --input-file=FILE download URLs found in FILE. -F, --force-html treat input file as HTML. -B, --base=URL prepends URL to relative links in -F -i file. --sslcertfile=FILE optional client certificate. --sslcertkey=KEYFILE optional keyfile for this certificate. --egd-file=FILEfile name of the EGD socket. Download: --bind-address=ADDRESS bind to ADDRESS (hostname or IP) on local host. -t, --tries=NUMBER set number of retries to NUMBER (0 unlimits). -O --output-document=FILE write documents to FILE. -nc, --no-clobber don't clobber existing files or use .# suffixes. -c, --continue resume getting a partially-downloaded file. --progress=TYPE select progress gauge type. -N, --timestamping don't re-retrieve files unless newer than local. -S, --server-responseprint server response. --spider don't download anything. -T, --timeout=SECONDSset the read timeout to SECONDS. -w, --wait=SECONDS wait SECONDS between retrievals. --waitretry=SECONDS wait 1...SECONDS between retries of a retrieval. --random-waitwait from 0...2*WAIT secs between retrievals. -Y, --proxy=on/off turn proxy on or off. -Q, --quota=NUMBER set retrieval quota to NUMBER. --limit-rate=RATElimit download rate to RATE. Directories: -nd --no-directoriesdon't create directories. -x, --force-directories force creation of directories. -nH, --no-host-directories don't create host directories. -P, --directory-prefix=PREFIX save files to PREFIX/... --cut-dirs=NUMBER ignore NUMBER remote directory components. HTTP options: --http-user=USER set http user to USER. --http-passwd=PASSset http password to PASS. -C, --cache=on/off(dis)allow server-cached data (normally allowed). -E, --html-extension save all text/html documents with .html extension. --ignore-length ignore `Content-Length' header field. --header=STRING insert STRING among the headers. --proxy-user=USER set USER as proxy username. --proxy-passwd=PASS set PASS as proxy password. --referer=URL include `Referer: URL' header in HTTP request. -s, --save-headerssave the HTTP headers to file. -U, --user-agent=AGENTidentify as AGENT instead of Wget/VERSION. --no-http-keep-alive disable HTTP keep-alive (persistent connections). --cookies=off don't use cookies. --load-cookies=FILE load cookies from FILE before session. --save-cookies=FILE save cookies to FILE after session. FTP options: -nr, --dont-remove-listing don't remove `.listing' files. -g, --glob=on/off turn file name globbing on or off. --passive-ftp use the passive transfer mode. --retr-symlinks when recursing, get linked-to files (not dirs). Recursive retrieval: -r, --recursive recursive web-suck -- use with care! -l, --level=NUMBER maximum recursion depth (inf or 0 for infinite). --delete-after delete files locally after downloading them. -k, --convert-links convert non-relative links to relative. -K, --backup-converted before converting file X, back up as X.orig. -m, --mirror shortcut option equivalent to -r -N -l inf -nr. -p, --page-requisites
RE: unreasonable not to doc ascii vs. binary in the --help text
You said: The type selection is rarely needed ... This is untrue. I just tried this out using wget on Windows. If you don't tack on ;type=a onto the end when transfering a text file from unix to Windows, the file's line endings will not be converted from unix (LF) to Windows (CRLF) conventions. If you look at the file in applications that just follow windows conventions, e.g., Notepad, the lines will not be broken in the display. Some applications (e.g., web browsers) follow a liberal interpretation of line endings, which helps overcome this problem, but many do not, including programs that read ascii (text) files as data, and will silently but fatally malfunction if the CR is not there in front of the LF. So, with unix to Windows transfer of text files being obviously an extremely common case, this clearly deserves a few lines in your --help documentation. It can hardly violate any length limit for that text -- there seems to be none, since the text goes on and on and documents such seldom needed options as passive mode: --passive-ftp use the passive transfer mode. And many others that don't deserve as much attention as ascii vs. binary transfer. Thanks, Mark -Original Message- From: Maciej W. Rozycki [mailto:[EMAIL PROTECTED] Sent: Mon, August 18, 2003 11:22 AM To: Mark David Cc: '[EMAIL PROTECTED]' Subject: Re: unreasonable not to doc ascii vs. binary in the --help text On Mon, 18 Aug 2003, Mark David wrote: When I look at the long help for wget, there's no mention of how to arrange for ascii vs. binary download. It should be under FTP options. I've used FTP for almost 20 years, and the ASCII or BINARY commands are the two most common commands outside of get and put. I think it's pretty unreasonable not to mention binary or ascii it in the --help output at all. For simple file downloads on the command line, which is when you need this help desk to help you, the way to specify binary is crucial. The default download type wget uses is binary. If you want another type, then ;type=X (X denotes the desired type; e.g. i is binary and a is ASCII) can be appended to a URL. It's all documented within the wget's info pages. The type selection is rarely needed -- typically for downloading a text file from an EBCDIC host -- so including it with the short help reference would seem to be an overkill. -- + Maciej W. Rozycki, Technical University of Gdansk, Poland + +--+ +e-mail: [EMAIL PROTECTED], PGP key available+
RE: wget is mirroring whole internet instead of just my web page!
man wget shows: -D domain-list --domains=domain-list Set domains to be followed. domain-list is a comma-separated list of domains. Note that it does not turn on -H. Mark Post -Original Message- From: Andrzej Kasperowicz [mailto:[EMAIL PROTECTED] Sent: Monday, August 18, 2003 8:38 AM To: [EMAIL PROTECTED] Subject: wget is mirroring whole internet instead of just my web page! When I try to mirror web pages using the command: wget -m -nv -k -K -nH -t 100 -o logchemfanpl -P public_html/mirror http://znik.wbc.lublin.pl/ChemFan/ wget is mirroring not just the domain of the web page but just whole internet... There is robot.txt files, but it should not influence wget to download all available domains I suppose? So why is it happening and how to avoid it? Regards Andrzej.
RE: wget is mirroring whole internet instead of just my web page!
On 18 Aug 2003 at 13:49, Post, Mark K wrote: man wget shows: -D domain-list --domains=domain-list Set domains to be followed. domain-list is a comma-separated list of domains. Note that it does not turn on -H. Right, but by default wget should not follow all domains, then why it was happening in this case? I tried also to mirror another web site from the same server, also containing links to other domains: wget -m -nv -k -K -nH -t 100 -o logmineraly -P public_html/mirror http://znik.wbc.lublin.pl/Mineraly/ and in this case it was not downloading from other domains. So that's a mystery really. Anyway, if I add -D wbc.lublin.pl it should run correctly? wget -m -nv -k -K -nH -t 100 -D wbc.lublin.pl -o logchemfanpl -P public_html/mirror http://znik.wbc.lublin.pl/ChemFan/ ak
RE: wget is mirroring whole internet instead of just my web page!
It's always been my experience when specifying -m that wget does follow across domains by default. I've always had to tell it not to do that. Mark Post -Original Message- From: Andrzej Kasperowicz [mailto:[EMAIL PROTECTED] Sent: Monday, August 18, 2003 4:02 PM To: Post, Mark K; [EMAIL PROTECTED] Subject: RE: wget is mirroring whole internet instead of just my web page! On 18 Aug 2003 at 13:49, Post, Mark K wrote: man wget shows: -D domain-list --domains=domain-list Set domains to be followed. domain-list is a comma-separated list of domains. Note that it does not turn on -H. Right, but by default wget should not follow all domains, then why it was happening in this case? I tried also to mirror another web site from the same server, also containing links to other domains: wget -m -nv -k -K -nH -t 100 -o logmineraly -P public_html/mirror http://znik.wbc.lublin.pl/Mineraly/ and in this case it was not downloading from other domains. So that's a mystery really. Anyway, if I add -D wbc.lublin.pl it should run correctly? wget -m -nv -k -K -nH -t 100 -D wbc.lublin.pl -o logchemfanpl -P public_html/mirror http://znik.wbc.lublin.pl/ChemFan/ ak
Re: unreasonable not to doc ascii vs. binary in the --help text
On Mon, 18 Aug 2003, Mark David wrote: When I look at the long help for wget, there's no mention of how to arrange for ascii vs. binary download. It should be under FTP options. I've used FTP for almost 20 years, and the ASCII or BINARY commands are the two most common commands outside of get and put. I think it's pretty unreasonable not to mention binary or ascii it in the --help output at all. For simple file downloads on the command line, which is when you need this help desk to help you, the way to specify binary is crucial. The default download type wget uses is binary. If you want another type, then ;type=X (X denotes the desired type; e.g. i is binary and a is ASCII) can be appended to a URL. It's all documented within the wget's info pages. The type selection is rarely needed -- typically for downloading a text file from an EBCDIC host -- so including it with the short help reference would seem to be an overkill. -- + Maciej W. Rozycki, Technical University of Gdansk, Poland + +--+ +e-mail: [EMAIL PROTECTED], PGP key available+