Wget only obeys robots.txt when doing a full recursive download of a
complete site:
"
Wget can follow links in HTML, XHTML, and CSS pages, to create local
versions of remote web sites, fully recreating the directory
structure
of the original site. This is sometimes referred to as "recursive
downloading." While doing that, Wget respects the Robot Exclusion
Standard (/robots.txt). Wget can be instructed to convert the links
in
downloaded files to point at the local files, for offline viewing.
"
/HH
2012/3/16 phil curb <[email protected]>
> i've made a file robots.txt but wget doesn't seem to be responding to it.
> it always downloads.
> http://pastebin.com/raw.php?i=kt1mV2af
>
>
> C:\r>wget 127.0.0.1:56
> --2012-03-16 19:45:32-- http://127.0.0.1:56/
> Connecting to 127.0.0.1:56... connected.
> HTTP request sent, awaiting response... 200 OK
> Length: 3 [text/html]
> Saving to: `index.html' 100%[======================================>] 3
> --.-K/s in 0s 2012-03-16 19:45:32 (20.0 KB/s) - `index.html'
> saved [3/3] C:\r>wget 127.0.0.1:56/robots.txt
> --2012-03-16 19:45:43-- http://127.0.0.1:56/robots.txt
> Connecting to 127.0.0.1:56... connected.
> HTTP request sent, awaiting response... 200 OK
> Length: 26 [text/plain]
> Saving to: `robots.txt' 100%[======================================>] 26
> --.-K/s in 0s 2012-03-16 19:45:43 (175 KB/s) - `robots.txt' saved
> [26/26] C:\r>type robots.txt
> User-agent: *
> Disallow: /
> C:\r>
>
>
>