if you run wireshark to actually see the client-server exchange this
is what you see:
~
// __ while using wget
~
GET /files/11/11.zip HTTP/1.0
User-Agent: Wget/1.12 (linux-gnu)
Accept: */*
Host: www.gutenberg.org
Connection: Keep-Alive
HTTP/1.1 403 Forbidden
Date: Fri, 22 Jun 2012 19:42:10 GMT
Server: Apache
Connection: close
Expires: Sun, 03 Oct 2004 12:00:00 GMT
Cache-Control: no-cache
X-Frame-Options: sameorigin
Content-Type: text/html
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>403 Access Forbidden</title>
</head>
<body>
<h1>403 Access Forbidden</h1>
<div style="margin: 0 10%; border: 1px solid red; padding: 1em">
<p>The Project Gutenberg Web Site is for human (non-automated) users only.
Any perceived use of automated tools to access our web site
will result in a temporary or permanent block of your IP address
or subnet.</p>
<p>To protect our human users we have now
<strong>blocked all access from hosting services.</strong></p>
<p>If you think you need to download all our books,
then use one of our mirrors nearest you:
See: <a href="/MIRRORS.ALL">list of PG mirrors</a> and
<a href="/terms_of_use/">PG terms of use</a>.</p>
</div>
<h2>Requested URI</h2>
<p>/files/11/11.zip</p>
<h2>Local time</h2>
<p>Fri, 22 Jun 2012 15:42:10 -0400</p>
<h2>IP Address</h2>
<p>74.125.226.225</p>
<h2>Browser</h2>
<p>Wget/1.12 (linux-gnu)</p>
<h2>Referrer</h2>
<p></p>
<h2>Server Protocol</h2>
<p>HTTP/1.0</p>
<h2>Accept Headers</h2>
<h3>Accept</h3>
<p>*/*</p>
<h3>Accept Charset</h3>
<p></p>
<h3>Accept Encoding</h3>
<p></p>
<h3>Accept Language</h3>
<p></p>
</body>
</html>
~
// __ while using HttpClient
~
GET /files/11/11.zip HTTP/1.1
Host: www.gutenberg.org
Connection: Keep-Alive
User-Agent: Apache-HttpClient/4.2 (java 1.5)
HTTP/1.1 200 OK
Date: Fri, 22 Jun 2012 19:39:45 GMT
Server: Apache
Last-Modified: Tue, 20 Dec 2011 16:01:58 GMT
ETag: "4a66bf-ed3b-4b4882fd6c980"
Accept-Ranges: bytes
Content-Length: 60731
Cache-Control: max-age=604800
Expires: Fri, 29 Jun 2012 19:39:45 GMT
X-Frame-Options: sameorigin
Keep-Alive: timeout=5, max=190
Connection: Keep-Alive
Content-Type: application/zip
...
< even you only asked for the Response Headers the server sends along
the payload>
~
Also it seems that, as requested, HttpClient instead of being 403ed
seamlessly follows the redirection, but why isn't that negotiation
reported?
~
lbrtchx
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]