On Thu, 26 Jul 2001, Alex Venn wrote:

> HEAD is hardly ever used and, on my website at least, never by Arachne.

  True.  I was thinking partly in HTTP 1.0 and
crossing that up with arachne's cache heirarchy, in
which the same info obtained by a HEAD command is
stored in the *.http file  (*.HTP in DOS)

> 99.5% of all access is by GET requests.

  Well... certainly not on my machine:

[root@wizard httpd]# cat access_log* | grep HEAD | wc -l
    123
[root@wizard httpd]# cat access_log* | wc -l
   3126

  So, a FULL 4% of the requests from my server are HEADs.
;-)

> If all HTML files are cached to disk, they can obviously be reviewed 
> offline and it's comparatively easy to track a site's access history if
> either the files are saved with sequential local names 

  You don't actually use Arachne, do you...  Yes, 
cache filenames are assigned by time -- how many 
seconds since some date in 1970 (31 Jan?).  

> An additional trick I like is for each file to have it's URL added at 
> the top of the cached file in a comment.

Contents of 996179725.HTM.http:

<TITLE>HTTP header of http://wizard.dyndns.org/</TITLE><PRE>
HTTP/1.1 200 OK
Date: Thu, 26 Jul 2001 20:35:15 GMT
Server: Apache/1.3.12 (Unix)  (Red Hat/Linux) PHP/3.0.15
Last-Modified: Fri, 13 Jul 2001 14:03:22 GMT
ETag: "3058c-ab8-3b4effaa"
Accept-Ranges: bytes
Content-Length: 2744
Connection: close
Content-Type: text/html
</PRE><HR>URL:<A
HREF="http://wizard.dyndns.org/";>http://wizard.dyndns.org/</A><BR>Local:<A
HREF="file:/home/steve/.arachne/cache/996179725.HTM">/home/steve/.arachne/cache/996179725.HTM</A><HR>

  So, you can see the *.http file relates the original
filename to the cached filename... which is how the 
image links are maintained within the cache.
(also note that the info in the *.http file is nearly
identical to the info returned by the HEAD command,
hence my mistaken earlier statement that a HEAD 
command always precedes a GET command)

Info returned by the HEAD command:

HTTP/1.1 200 OK
Date: Thu, 26 Jul 2001 20:37:59 GMT
Server: Apache/1.3.12 (Unix)  (Red Hat/Linux) PHP/3.0.15
Last-Modified: Fri, 13 Jul 2001 14:03:22 GMT
ETag: "3058c-ab8-3b4effaa"
Accept-Ranges: bytes
Content-Length: 2744
Connection: close
Content-Type: text/html

  And for those who really want to know more about the
HTTP 1.1 protocol, (it's long)
http://www.w3.org/Protocols/rfc2068/rfc2068

 - Steve


Reply via email to