Re: [PATCH] Make caching hash more deterministic
On Sat, Aug 13, 2005 at 10:29:54AM +0200, Graham Leggett wrote: The idea of canonicalising the name is sound, but munging them into an added :80 and an added ? is really ugly - these are not the kind of URLs that an end user would understand at a glance if they had to see them listed. An end-user should never see these keys, the only place they are visible to any user is the semi-binary mod_disk_cache files. An administrator would have to really know what they're doing to find them, or be using htcacheadmin - once I finish that, and if it gets accepted. Is it possible to remove the :80 if the scheme is https, and remove the :443 if the scheme is https:? What is the significance of the added ?? The ? isn't me, that's current mod_cache behaviour, so I left it alone. It doesn't have any significance except for avoiding an extra condition. r-args is part of the key aswell, it just happens to have been NULL in those examples. Either way, doing as you suggest is trivial, but is there really a point adding more conditions? Any tool which does inspect the cache files can clean it up for presentation to the administrator. -- Colm MacCárthaighPublic Key: [EMAIL PROTECTED]
Re: [PATCH] Make caching hash more deterministic
On Aug 15, 2005, at 4:10 AM, Colm MacCarthaigh wrote: On Sat, Aug 13, 2005 at 10:29:54AM +0200, Graham Leggett wrote: The idea of canonicalising the name is sound, but munging them into an added :80 and an added ? is really ugly - these are not the kind of URLs that an end user would understand at a glance if they had to see them listed. An end-user should never see these keys, the only place they are visible to any user is the semi-binary mod_disk_cache files. An administrator would have to really know what they're doing to find them, or be using htcacheadmin - once I finish that, and if it gets accepted. Is it possible to remove the :80 if the scheme is https, and remove the :443 if the scheme is https:? What is the significance of the added ?? The ? isn't me, that's current mod_cache behaviour, so I left it alone. It doesn't have any significance except for avoiding an extra condition. r-args is part of the key aswell, it just happens to have been NULL in those examples. Either way, doing as you suggest is trivial, but is there really a point adding more conditions? Any tool which does inspect the cache files can clean it up for presentation to the administrator. I think Colm has a valid point... since these are internal representations then the cleaning up would best be done by the actual view process. I would imagine that keeping the internal representations consistent would streamline the actual functional aspects.
Re: [PATCH] Make caching hash more deterministic
Colm MacCarthaigh wrote: An end-user should never see these keys, the only place they are visible to any user is the semi-binary mod_disk_cache files. An administrator would have to really know what they're doing to find them, or be using htcacheadmin - once I finish that, and if it gets accepted. Ok cool - I was not sure whether the URL list would be mined for data for any reason. Either way, doing as you suggest is trivial, but is there really a point adding more conditions? Any tool which does inspect the cache files can clean it up for presentation to the administrator. In that case you're right - the less work done on the URL, the faster it will be. Regards, Graham --
Re: [PATCH] Make caching hash more deterministic
Colm MacCarthaigh wrote: Currently; GET / HTTP/1.1 Host: ftp.heanet.ie GET http://ftp.heanet.ie/ HTTP/1.0 GET HTTP://Ftp.Heanet.Ie/ HTTP/1.0 are all mapped to different hashes by mod_cache; despite being the same content, this is an inefficient waste of disk space and really awkward for me trying to write a debug/admin tool. The attached patch makes it deterministic, by mapping them all to; http://ftp.heanet.ie:80/?; The idea of canonicalising the name is sound, but munging them into an added :80 and an added ? is really ugly - these are not the kind of URLs that an end user would understand at a glance if they had to see them listed. Is it possible to remove the :80 if the scheme is https, and remove the :443 if the scheme is https:? What is the significance of the added ?? Regards, Graham --
Re: [PATCH] Make caching hash more deterministic
On Fri, Aug 12, 2005 at 01:34:50PM -0400, Jim Jagielski wrote: Here's a more involved patch that gets the logic right. It's 6pm on a Friday for me, so I have only tested it a little, but thought I'd share for comment before the weekend. +1 on inspection... testing to be done over the weekend :) Of course :) I've run http local and proxy cases, and ftp proxy cases, as well as a few odd things now. With UseCanonicalName on, it does improve the hitrates. I've changed the patch a little (attached) but only some cosmetic comment changes, and I ditched the local:// uri; Thinking about it, for a cache to be shared amongst protocols, things like the connection port would have to be faked anyway. So might aswell include the real serving protocol - makes much more sense to administrators. -- Colm MacCárthaighPublic Key: [EMAIL PROTECTED] Index: modules/cache/cache_storage.c === --- modules/cache/cache_storage.c (revision 232437) +++ modules/cache/cache_storage.c (working copy) @@ -320,12 +320,82 @@ apr_status_t cache_generate_key_default(request_rec *r, apr_pool_t* p, char**key) { -if (r-hostname) { -*key = apr_pstrcat(p, r-hostname, r-uri, ?, r-args, NULL); +char *port_str, *scheme, *hn; +const char * hostname; +int i; + +/* Use the canonical name to improve cache hit rate, but only if this is + * not a proxy request. + */ +if (!r-proxyreq) { +/* Use _default_ as the hostname if none present, as in mod_vhost */ +hostname = ap_get_server_name(r); +if (!hostname) { +hostname = _default_; +} } +else if(r-parsed_uri.hostname) { +/* Copy the parsed uri hostname */ +hn = apr_pcalloc(p, strlen(r-parsed_uri.hostname) + 1); +for (i = 0; r-parsed_uri.hostname[i]; i++) { +hn[i] = apr_tolower(r-parsed_uri.hostname[i]); +} +/* const work-around */ +hostname = hn; +} else { -*key = apr_pstrcat(p, r-uri, ?, r-args, NULL); +/* We are a proxied request, with no hostname. Unlikely + * to get very far - but just in case */ +hostname = _default_; } + +/* Copy the scheme, ensuring that it is lower case. If the parsed uri + * contains no string or if this is not a proxy request. + */ +if (r-proxyreq r-parsed_uri.scheme) { +/* Copy the scheme */ +scheme = apr_pcalloc(p, strlen(r-parsed_uri.scheme) + 1); +for (i = 0; r-parsed_uri.scheme[i]; i++) { +scheme[i] = apr_tolower(r-parsed_uri.scheme[i]); +} +} +else { +scheme = http; +} + +/* If the content is locally generated, use the port-number of the + * current server. Otherwise. copy the URI's port-string (which may be a + * service name). If the URI contains no port-string, use apr-util's + * notion of the default port for that scheme - if available. + */ +if(r-proxyreq) { +if (r-parsed_uri.port_str) { +port_str = apr_pcalloc(p, strlen(r-parsed_uri.port_str) + 2); +port_str[0] = ':'; +for (i = 0; r-parsed_uri.port_str[i]; i++) { +port_str[i + 1] = apr_tolower(r-parsed_uri.port_str[i]); +} +} +else if (apr_uri_port_of_scheme(scheme)) { +port_str = apr_psprintf(p, :%u, apr_uri_port_of_scheme(scheme)); +} +else { +/* No port string given in the AbsoluteUri, and we have no + * idea what the default port for the scheme is. Leave it + * blank and live with the inefficiency of some extra cached + * entities. + */ +port_str = ; +} +} +else { +/* Use the server port */ +port_str = apr_psprintf(p, :%u, ap_get_server_port(r)); +} + +/* Key format is a URI */ +*key = apr_pstrcat(p, scheme, ://, hostname, port_str, + r-parsed_uri.path, ?, r-args, NULL); + return APR_SUCCESS; } -
Re: [PATCH] Make caching hash more deterministic
Should this honor usecanonicalname? If so, could just use ap_get_servername(r) in stead of r-hostname. This may further compact the number of entries. -- Brian Akins Lead Systems Engineer CNN Internet Technologies
Re: [PATCH] Make caching hash more deterministic
On Fri, Aug 12, 2005 at 11:54:44AM -0400, Brian Akins wrote: Should this honor usecanonicalname? If so, could just use ap_get_servername(r) in stead of r-hostname. This may further compact the number of entries. Yes, but I think there'd have to be additional code to detect the proxy cases. And you pointing that out has just reminded me of a bug in my patch - it doesn't work for; GET ftp://ftp.heanet.ie/pub/heanet/100.txt HTTP/1.0 I'll go make that work too. -- Colm MacCárthaighPublic Key: [EMAIL PROTECTED]
Re: [PATCH] Make caching hash more deterministic
On Fri, Aug 12, 2005 at 04:59:20PM +0100, Colm MacCarthaigh wrote: On Fri, Aug 12, 2005 at 11:54:44AM -0400, Brian Akins wrote: Should this honor usecanonicalname? If so, could just use ap_get_servername(r) in stead of r-hostname. This may further compact the number of entries. Yes, but I think there'd have to be additional code to detect the proxy cases. And you pointing that out has just reminded me of a bug in my patch - it doesn't work for; GET ftp://ftp.heanet.ie/pub/heanet/100.txt HTTP/1.0 I'll go make that work too. Here's a more involved patch that gets the logic right. It's 6pm on a Friday for me, so I have only tested it a little, but thought I'd share for comment before the weekend. -- Colm MacCárthaighPublic Key: [EMAIL PROTECTED] Index: modules/cache/cache_storage.c === --- modules/cache/cache_storage.c (revision 232304) +++ modules/cache/cache_storage.c (working copy) @@ -318,12 +318,90 @@ apr_status_t cache_generate_key_default(request_rec *r, apr_pool_t* p, char**key) { -if (r-hostname) { -*key = apr_pstrcat(p, r-hostname, r-uri, ?, r-args, NULL); +const char *hostname; +char *port_str, *scheme, *hn; +int i; + +/* Use the canonical name to improve cache hit rate, but only if this is + * not a proxy request. + */ +if (!r-proxyreq) { +/* Use _default_ as the hostname if none present, as in mod_vhost + */ +hostname = ap_get_server_name(r); +if (!hostname) { +hostname = _default_; +} } +else if(r-parsed_uri.hostname) { +/* Copy the parsed uri hostname */ +hn = apr_pcalloc(p, strlen(r-parsed_uri.hostname) + 1); +for (i = 0; r-parsed_uri.hostname[i]; i++) { +hn[i] = apr_tolower(r-parsed_uri.hostname[i]); +} + +/* const work-around */ +hostname = hn; +} else { -*key = apr_pstrcat(p, r-uri, ?, r-args, NULL); +/* We are a proxied request, with no hostname. Unlikely + * to get very far - but just in case */ +hostname = _default_; } + +/* Copy the scheme, ensuring that it is lower case. If the parsed uri + * contains no string or if this is not a proxy request, we use local as + * the default. + * + * Why local? Ans: to indicate that the content is locally generated, and + * because Apache can serve multiple protocols, lets not get tied to a + * single one. This way a mod_[ftp|bittorrent|foobar] front-end can share + * our content cache. + */ +if (r-proxyreq r-parsed_uri.scheme) { +/* Copy the scheme */ +scheme = apr_pcalloc(p, strlen(r-parsed_uri.scheme) + 1); +for (i = 0; r-parsed_uri.scheme[i]; i++) { +scheme[i] = apr_tolower(r-parsed_uri.scheme[i]); +} +} +else { +scheme = local; +} + +/* If the content is locally generated, use the port-number of the + * current server. Otherwise. copy the URI's port-string (which may be a + * service name). If the URI contains no port-string, use apr-util's + * notion of the default port for that scheme - if available. + */ +if(r-proxyreq) { +if (r-parsed_uri.port_str) { +port_str = apr_pcalloc(p, strlen(r-parsed_uri.port_str) + 2); +port_str[0] = ':'; +for (i = 0; r-parsed_uri.port_str[i]; i++) { +port_str[i + 1] = apr_tolower(r-parsed_uri.port_str[i]); +} +} +else if (apr_uri_port_of_scheme(scheme)) { +port_str = apr_psprintf(p, :%u, apr_uri_port_of_scheme(scheme)); +} +else { +/* No port string given in the AbsoluteUri, and we have no + * idea what the default port for the scheme is. Leave it + * blank and live with the inefficiency of some extra cached + * entities. + */ +port_str = ; +} +} +else { +/* Use the server port */ +port_str = apr_psprintf(p, :%u, ap_get_server_port(r)); +} + +/* Key format is a URI */ +*key = apr_pstrcat(p, scheme, ://, hostname, port_str, + r-parsed_uri.path, ?, r-args, NULL); + return APR_SUCCESS; } -
Re: [PATCH] Make caching hash more deterministic
On Aug 12, 2005, at 1:10 PM, Colm MacCarthaigh wrote: On Fri, Aug 12, 2005 at 04:59:20PM +0100, Colm MacCarthaigh wrote: On Fri, Aug 12, 2005 at 11:54:44AM -0400, Brian Akins wrote: Should this honor usecanonicalname? If so, could just use ap_get_servername(r) in stead of r-hostname. This may further compact the number of entries. Yes, but I think there'd have to be additional code to detect the proxy cases. And you pointing that out has just reminded me of a bug in my patch - it doesn't work for; GET ftp://ftp.heanet.ie/pub/heanet/100.txt HTTP/1.0 I'll go make that work too. Here's a more involved patch that gets the logic right. It's 6pm on a Friday for me, so I have only tested it a little, but thought I'd share for comment before the weekend. +1 on inspection... testing to be done over the weekend :)