On Thu, 26 Jul 2001, Alex Venn wrote:
> HEAD is hardly ever used and, on my website at least, never by Arachne.
True. I was thinking partly in HTTP 1.0 and
crossing that up with arachne's cache heirarchy, in
which the same info obtained by a HEAD command is
stored in the *.http file (*.HTP in DOS)
> 99.5% of all access is by GET requests.
Well... certainly not on my machine:
[root@wizard httpd]# cat access_log* | grep HEAD | wc -l
123
[root@wizard httpd]# cat access_log* | wc -l
3126
So, a FULL 4% of the requests from my server are HEADs.
;-)
> If all HTML files are cached to disk, they can obviously be reviewed
> offline and it's comparatively easy to track a site's access history if
> either the files are saved with sequential local names
You don't actually use Arachne, do you... Yes,
cache filenames are assigned by time -- how many
seconds since some date in 1970 (31 Jan?).
> An additional trick I like is for each file to have it's URL added at
> the top of the cached file in a comment.
Contents of 996179725.HTM.http:
<TITLE>HTTP header of http://wizard.dyndns.org/</TITLE><PRE>
HTTP/1.1 200 OK
Date: Thu, 26 Jul 2001 20:35:15 GMT
Server: Apache/1.3.12 (Unix) (Red Hat/Linux) PHP/3.0.15
Last-Modified: Fri, 13 Jul 2001 14:03:22 GMT
ETag: "3058c-ab8-3b4effaa"
Accept-Ranges: bytes
Content-Length: 2744
Connection: close
Content-Type: text/html
</PRE><HR>URL:<A
HREF="http://wizard.dyndns.org/">http://wizard.dyndns.org/</A><BR>Local:<A
HREF="file:/home/steve/.arachne/cache/996179725.HTM">/home/steve/.arachne/cache/996179725.HTM</A><HR>
So, you can see the *.http file relates the original
filename to the cached filename... which is how the
image links are maintained within the cache.
(also note that the info in the *.http file is nearly
identical to the info returned by the HEAD command,
hence my mistaken earlier statement that a HEAD
command always precedes a GET command)
Info returned by the HEAD command:
HTTP/1.1 200 OK
Date: Thu, 26 Jul 2001 20:37:59 GMT
Server: Apache/1.3.12 (Unix) (Red Hat/Linux) PHP/3.0.15
Last-Modified: Fri, 13 Jul 2001 14:03:22 GMT
ETag: "3058c-ab8-3b4effaa"
Accept-Ranges: bytes
Content-Length: 2744
Connection: close
Content-Type: text/html
And for those who really want to know more about the
HTTP 1.1 protocol, (it's long)
http://www.w3.org/Protocols/rfc2068/rfc2068
- Steve