Re: svn commit: r468373 - in /httpd/httpd/trunk: CHANGES modules/cache/mod_cache.c modules/cache/mod_cache.h modules/cache/mod_disk_cache.c modules/cache/mod_disk_cache.h modules/cache/mod_mem_cache.c
On 10/27/2006 03:28 PM, [EMAIL PROTECTED] wrote: Author: minfrin Date: Fri Oct 27 06:28:56 2006 New Revision: 468373 URL: http://svn.apache.org/viewvc?view=revrev=468373 Log: mod_cache: Pass the output filter stack through the store_body() hook, giving each cache backend the ability to make a better decision as to how it will allocate the tasks of writing to the cache and writing to the network. Previously the write to the cache task needed to be complete before the same brigade was written to the network, and this caused timing and memory issues on large cached files. This fix replaces the previous fix for PR39380. Modified: httpd/httpd/trunk/modules/cache/mod_disk_cache.c URL: http://svn.apache.org/viewvc/httpd/httpd/trunk/modules/cache/mod_disk_cache.c?view=diffrev=468373r1=468372r2=468373 == --- httpd/httpd/trunk/modules/cache/mod_disk_cache.c (original) +++ httpd/httpd/trunk/modules/cache/mod_disk_cache.c Fri Oct 27 06:28:56 2006 @@ -1286,9 +1292,15 @@ static apr_status_t open_new_file(request_rec *r, const char *filename, apr_file_t **fd, disk_cache_conf *conf) { -int flags = APR_CREATE | APR_WRITE | APR_BINARY | APR_BUFFERED | APR_EXCL; +int flags; apr_status_t rv; +flags = APR_CREATE | APR_WRITE | APR_READ | APR_BINARY | APR_BUFFERED | APR_EXCL | APR_TRUNCATE; +#if APR_HAS_SENDFILE +flags |= ((pdconf-enable_sendfile == ENABLE_SENDFILE_OFF) + ? 0 : APR_SENDFILE_ENABLED); +#endif + This breaks compilation of mod_disk_cache on systems that have APR_HAS_SENDFILE set. Regards RĂ¼diger
Re: svn commit: r468373 - in /httpd/httpd/trunk: CHANGES modules/cache/mod_cache.c modules/cache/mod_cache.h modules/cache/mod_disk_cache.c modules/cache/mod_disk_cache.h modules/cache/mod_mem_cache.c
Justin Erenkrantz wrote: I will also note that the mod_cache provider system has explicit versioning, so any modifications to the providers should be represented with a new version number. (i.e. providers for version 0 should work while offering new features in version 1-class providers.) We do not arbitrarily tweak the old provider structures any more - instead, we introduce new versions. -- justin Justin, Can you clarify the above a bit? I don't understand what you're referring to. Looking at the 2.2.3 tag, what versioning is currently in place? Thanks, Issac
Re: svn commit: r468373 - in /httpd/httpd/trunk: CHANGES modules/cache/mod_cache.c modules/cache/mod_cache.h modules/cache/mod_disk_cache.c modules/cache/mod_disk_cache.h modules/cache/mod_mem_cache.c
On 10/30/06, Issac Goldstand [EMAIL PROTECTED] wrote: Can you clarify the above a bit? I don't understand what you're referring to. Looking at the 2.2.3 tag, what versioning is currently in place? Look at disk_cache_register_hook. /* cache initializer */ ap_register_provider(p, CACHE_PROVIDER_GROUP, disk, 0, cache_disk_provider); The 0 is the version string with the cache_disk_provider vtable acting as the 0 version.. When we have a new cache provider version (i.e. the vtable changes), that gets bumped up to 1. Then, mod_cache can select a 0 or 1-class backend. Preferably, it should be able to use both; but trunk would prefer 1 and fallback to 0 if it's not available. HTH. -- justin
Re: svn commit: r468373 - in /httpd/httpd/trunk: CHANGES modules/cache/mod_cache.c modules/cache/mod_cache.h modules/cache/mod_disk_cache.c modules/cache/mod_disk_cache.h modules/cache/mod_mem_cache.c
Justin Erenkrantz wrote: On 10/30/06, Issac Goldstand [EMAIL PROTECTED] wrote: Can you clarify the above a bit? I don't understand what you're referring to. Looking at the 2.2.3 tag, what versioning is currently in place? Look at disk_cache_register_hook. /* cache initializer */ ap_register_provider(p, CACHE_PROVIDER_GROUP, disk, 0, cache_disk_provider); The 0 is the version string with the cache_disk_provider vtable acting as the 0 version.. When we have a new cache provider version (i.e. the vtable changes), that gets bumped up to 1. Then, mod_cache can select a 0 or 1-class backend. Preferably, it should be able to use both; but trunk would prefer 1 and fallback to 0 if it's not available. Gotcha. Thanks for the insight. Looking at provider.c, a couple of questions spring to mind: 1) Why isn't this part of apr-util? (it seems similar to apr_optional.h - just intended for vtables rather than functions, and with this version info) 2) It seems that if trunk would want to use version 1 and fallback to 0, the control logic for that would need to be in ap_cache_get_providers and would need to check every version string mod_cache is interested in (for an exact string match, unless the hash function's doing something funky). Wouldn't it be smarter to use a numeric version and simply have ap_lookup_provider simply return the provider of the requested group/type with the highest id? It would make custom 3rd party modules easier to write too; we could define, say 1 as PROVIDER_ID_CUSTOM, making it easier to write add-ons to modules which use the provider interface to just write new providers with that ID (or that ID +1 or -1) rather than having to both write the new provider and also change the code which calls ap_lookup_provider to request the new provider version. Or did I miss the point somewhere?
Re: svn commit: r468373 - in /httpd/httpd/trunk: CHANGES modules/cache/mod_cache.c modules/cache/mod_cache.h modules/cache/mod_disk_cache.c modules/cache/mod_disk_cache.h modules/cache/mod_mem_cache.c
On Mon, 30 Oct 2006 01:53:18 +0200 Graham Leggett [EMAIL PROTECTED] wrote: The current expectation that it be possible to separate completely the storing of the cached response and the delivery of the content is broken. Why is that? (references to previous discussion will do, if applicable) We have a real world case where the cache is expected to process a many MB or many GB file completely, before sending that same response to the network. This is too slow, and takes up too much RAM, resulting in a broken response to the client. That suggests broken or implementation and/or inappropriate usage. It says nothing about expectation. On wednesday night I wrote a patch that solved the large file problem, while maintaining the current separation between write-to-cache and write-to-network as you assert. This mod_cache code broke up the brigade into bite sized chunks inside mod_cache before passing it to write-to-cache, then write-to-network, and so on. Joe vetoed the patch, saying that it duplicated the natural behaviour of apr_bucket_read(). The wednesday night patch was reverted, and thursday night was spent instead changing the cache_body() signature to make its own better judgement on how to handle cached files. Now you veto this next patch, saying it breaks the abstraction. All of which says this work is in a state of flux. I agree with you that a design needs to be found on list first, as I have wasted enough time going round in circles coming up with solution after solution nobody is happy with. Do we put this to a vote? There are quite a lot of contributors, and there may be competing designs. That sounds like the kind of situation where a single /trunk/ and a single cache framework instance is hopelessly inadequate. If necessary, let it fork, demo your ideas on a branch, then present that for discussion. Isn't that approximately what Paul suggested some time back? -- Nick Kew
Re: svn commit: r468373 - in /httpd/httpd/trunk: CHANGES modules/cache/mod_cache.c modules/cache/mod_cache.h modules/cache/mod_disk_cache.c modules/cache/mod_disk_cache.h modules/cache/mod_mem_cache.c
On 10/30/06, Issac Goldstand [EMAIL PROTECTED] wrote: Looking at provider.c, a couple of questions spring to mind: 1) Why isn't this part of apr-util? (it seems similar to apr_optional.h - just intended for vtables rather than functions, and with this version info) apr_optional is an ugly hack without a good API. =) The generic provider code grew out of mod_dav's provider interface. The aaa modules also uses this same interface as well. 2) It seems that if trunk would want to use version 1 and fallback to 0, the control logic for that would need to be in ap_cache_get_providers and would need to check every version string mod_cache is interested in (for an exact string match, unless the hash function's doing something funky). Wouldn't it be smarter to use a numeric version and simply have ap_lookup_provider simply return the provider of the requested group/type with the highest id? Yes, ap_cache_get_providers() would have the logic for checking 0 and 1 versions. Note that the APR hash abstraction that the provider is built on top of only works with strings - not integers. The general provider API is intended to only return specific versions that it's asked for. Let me explain... It would make custom 3rd party modules easier to write too; we could define, say 1 as PROVIDER_ID_CUSTOM, making it easier to write add-ons to modules which use the provider interface to just write new providers with that ID (or that ID +1 or -1) rather than having to both write the new provider and also change the code which calls ap_lookup_provider to request the new provider version. Or did I miss the point somewhere? Nah, that's not the point - you use the name value of the provider for third-party modules. The version number is fixed across all providers (generally). If I implement version 0 of the cache interface say in mod_disk_cache, then I call ap_register_provider() with version 0 with the name disk. In a hypothetical mod_large_disk_cache that implements version 1 of the cache interface, I then register version 1 with the name thumper (say). mod_cache is then taught how to handle version 0 and 1 providers. The user comes along with the following directive: CacheEnable thumper / mod_cache will then look up thumper version 1 and then 0. It'll find version 1 of the vtable and use that for its cache. However, if we wanted to do so, thumper could also register as disk version 1. So, a mod_cache that understands version 1 would use mod_large_disk_cache instead of mod_disk_cache (which doesn't implement version 0) if it understands version 1 providers. I hope this makes it a bit clearer. -- justin
Re: svn commit: r468373 - in /httpd/httpd/trunk: CHANGES modules/cache/mod_cache.c modules/cache/mod_cache.h modules/cache/mod_disk_cache.c modules/cache/mod_disk_cache.h modules/cache/mod_mem_cache.c
Justin Erenkrantz wrote: On 10/30/06, Issac Goldstand [EMAIL PROTECTED] wrote: Looking at provider.c, a couple of questions spring to mind: 1) Why isn't this part of apr-util? (it seems similar to apr_optional.h - just intended for vtables rather than functions, and with this version info) apr_optional is an ugly hack without a good API. =) Point still stands, though, that this might still fit there. The generic provider code grew out of mod_dav's provider interface. The aaa modules also uses this same interface as well. Right, I saw those this morning when I looked at provider.c 2) It seems that if trunk would want to use version 1 and fallback to 0, the control logic for that would need to be in ap_cache_get_providers and would need to check every version string mod_cache is interested in (for an exact string match, unless the hash function's doing something funky). Wouldn't it be smarter to use a numeric version and simply have ap_lookup_provider simply return the provider of the requested group/type with the highest id? Yes, ap_cache_get_providers() would have the logic for checking 0 and 1 versions. Note that the APR hash abstraction that the provider is built on top of only works with strings - not integers. The general provider API is intended to only return specific versions that it's asked for. Let me explain... It would make custom 3rd party modules easier to write too; we could define, say 1 as PROVIDER_ID_CUSTOM, making it easier to write add-ons to modules which use the provider interface to just write new providers with that ID (or that ID +1 or -1) rather than having to both write the new provider and also change the code which calls ap_lookup_provider to request the new provider version. Or did I miss the point somewhere? Nah, that's not the point - you use the name value of the provider for third-party modules. The version number is fixed across all providers (generally). If I implement version 0 of the cache interface say in mod_disk_cache, then I call ap_register_provider() with version 0 with the name disk. In a hypothetical mod_large_disk_cache that implements version 1 of the cache interface, I then register version 1 with the name thumper (say). mod_cache is then taught how to handle version 0 and 1 providers. The user comes along with the following directive: CacheEnable thumper / mod_cache will then look up thumper version 1 and then 0. It'll find version 1 of the vtable and use that for its cache. However, if we wanted to do so, thumper could also register as disk version 1. So, a mod_cache that understands version 1 would use mod_large_disk_cache instead of mod_disk_cache (which doesn't implement version 0) if it understands version 1 providers. I hope this makes it a bit clearer. -- justin It did. I think I need to digest it a bit more. I completely understand the logic of using provider names for third-party stuff, but some little voice in the back of my mind is still nagging me about the versioning logic; it's just going to be a pain to repeat the selection code for every module (eg, dav, aaa) that wants to bump a version and would be easier if the API just did it for you. Can't have everything, I guess... Regardless, many thanks for the insight! Issac
Re: svn commit: r468373 - in /httpd/httpd/trunk: CHANGES modules/cache/mod_cache.c modules/cache/mod_cache.h modules/cache/mod_disk_cache.c modules/cache/mod_disk_cache.h modules/cache/mod_mem_cache.c
On Fri, Oct 27, 2006 at 01:28:57PM -, [EMAIL PROTECTED] wrote: Author: minfrin Date: Fri Oct 27 06:28:56 2006 New Revision: 468373 URL: http://svn.apache.org/viewvc?view=revrev=468373 Log: mod_cache: Pass the output filter stack through the store_body() hook, giving each cache backend the ability to make a better decision as to how it will allocate the tasks of writing to the cache and writing to the network. Previously the write to the cache task needed to be complete before the same brigade was written to the network, and this caused timing and memory issues on large cached files. This fix replaces the previous fix for PR39380. -1. This breaks the abstraction between the cache providers and the filter streams. The cache providers should not be in the business of delivering content down to the next filter - that is the job of mod_cache. Following this route is completely anti-thetical to the separation between storing the cache response and delivery of the content. As others have mentioned, I would highly recommend that you take a step back and come up with a design on-list first, run it through the gauntlet on dev@, and test it before breaking carefully designed abstractions. -- justin
Re: svn commit: r468373 - in /httpd/httpd/trunk: CHANGES modules/cache/mod_cache.c modules/cache/mod_cache.h modules/cache/mod_disk_cache.c modules/cache/mod_disk_cache.h modules/cache/mod_mem_cache.c
Justin Erenkrantz wrote: -1. This breaks the abstraction between the cache providers and the filter streams. The cache providers should not be in the business of delivering content down to the next filter - that is the job of mod_cache. Following this route is completely anti-thetical to the separation between storing the cache response and delivery of the content. The current expectation that it be possible to separate completely the storing of the cached response and the delivery of the content is broken. We have a real world case where the cache is expected to process a many MB or many GB file completely, before sending that same response to the network. This is too slow, and takes up too much RAM, resulting in a broken response to the client. On wednesday night I wrote a patch that solved the large file problem, while maintaining the current separation between write-to-cache and write-to-network as you assert. This mod_cache code broke up the brigade into bite sized chunks inside mod_cache before passing it to write-to-cache, then write-to-network, and so on. Joe vetoed the patch, saying that it duplicated the natural behaviour of apr_bucket_read(). The wednesday night patch was reverted, and thursday night was spent instead changing the cache_body() signature to make its own better judgement on how to handle cached files. Now you veto this next patch, saying it breaks the abstraction. So, we have disgreement over the right way to solve the problem of the cache being expected to swallow mouthfuls too big for it to handle. I agree with you that a design needs to be found on list first, as I have wasted enough time going round in circles coming up with solution after solution nobody is happy with. Do we put this to a vote? Regards, Graham -- smime.p7s Description: S/MIME Cryptographic Signature
Re: svn commit: r468373 - in /httpd/httpd/trunk: CHANGES modules/cache/mod_cache.c modules/cache/mod_cache.h modules/cache/mod_disk_cache.c modules/cache/mod_disk_cache.h modules/cache/mod_mem_cache.c
On 10/29/06, Graham Leggett [EMAIL PROTECTED] wrote: The current expectation that it be possible to separate completely the storing of the cached response and the delivery of the content is broken. We have a real world case where the cache is expected to process a many MB or many GB file completely, before sending that same response to the network. This is too slow, and takes up too much RAM, resulting in a broken response to the client. In short, I haven't seen any evidence presented by you or others that this is due to a design flaw in the cache/provider abstraction. At best, mod_disk_cache could be smarter about storing the file when it's large - but that's just a small lines of code to fix - it doesn't require any massive changes or new bucket types. Just copy the file bucket and consume that within store_body()'s implementation. If that isn't enough, please identify here on list and backed by references to the implementation and timings that it is 'too slow', 'takes up too much RAM', and results in a 'broken response to the client' and why we must break all of our cache structures and abstractions. I'm tired of reviewing large code changes with only vague generalities about why the code must change. These decisions need to be explained and reviewed on list before any more code is committed. Your recent commits have been chock full of mistakes and style violations - which make it almost impossible to review what's going on. If you are going to commit, please take care to follow all of our style guidelines and please ensure the commit works before using our trunk as your personal dumping ground. If your changes are incomplete, feel free to post them to the list instead of committing. Looking at the current implementation of mod_disk_cache, someone has turned it into unreadable and unmanagable code. Take a look at: http://svn.apache.org/repos/asf/httpd/httpd/tags/2.2.3/modules/cache/mod_disk_cache.c versus http://svn.apache.org/repos/asf/httpd/httpd/trunk/modules/cache/mod_disk_cache.c The code somehow went from 9KB total to 15KB for no good reason. Somewhere, we went really wrong here. So, we have disgreement over the right way to solve the problem of the cache being expected to swallow mouthfuls too big for it to handle. I agree with you that a design needs to be found on list first, as I have wasted enough time going round in circles coming up with solution after solution nobody is happy with. Do we put this to a vote? We're not even close to knowing what we'd be voting on. So, please draft up a proposed design that explains in detail why you think there is a problem and what specific changes you propose and why that is the best solution. If you want to submit a patch to go with your rationale, cool. Yet, I certainly don't see any fundamental reason why Joe's concerns and mine can't both be addressed in the same design. I will also note that the mod_cache provider system has explicit versioning, so any modifications to the providers should be represented with a new version number. (i.e. providers for version 0 should work while offering new features in version 1-class providers.) We do not arbitrarily tweak the old provider structures any more - instead, we introduce new versions. -- justin
Re: svn commit: r468373 - in /httpd/httpd/trunk: CHANGES modules/cache/mod_cache.c modules/cache/mod_cache.h modules/cache/mod_disk_cache.c modules/cache/mod_disk_cache.h modules/cache/mod_mem_cache.c
[EMAIL PROTECTED] wrote: Author: minfrin Date: Fri Oct 27 06:28:56 2006 New Revision: 468373 URL: http://svn.apache.org/viewvc?view=revrev=468373 Log: mod_cache: Pass the output filter stack through the store_body() hook, giving each cache backend the ability to make a better decision as to how it will allocate the tasks of writing to the cache and writing to the network. Previously the write to the cache task needed to be complete before the same brigade was written to the network, and this caused timing and memory issues on large cached files. This fix replaces the previous fix for PR39380. Modified: httpd/httpd/trunk/CHANGES httpd/httpd/trunk/modules/cache/mod_cache.c httpd/httpd/trunk/modules/cache/mod_cache.h httpd/httpd/trunk/modules/cache/mod_disk_cache.c httpd/httpd/trunk/modules/cache/mod_disk_cache.h httpd/httpd/trunk/modules/cache/mod_mem_cache.c .. @@ -1286,9 +1292,15 @@ static apr_status_t open_new_file(request_rec *r, const char *filename, apr_file_t **fd, disk_cache_conf *conf) { -int flags = APR_CREATE | APR_WRITE | APR_BINARY | APR_BUFFERED | APR_EXCL; +int flags; apr_status_t rv; +flags = APR_CREATE | APR_WRITE | APR_READ | APR_BINARY | APR_BUFFERED | APR_EXCL | APR_TRUNCATE; Nitpick: APR_TRUNCATE here is useless. +#if APR_HAS_SENDFILE +flags |= ((pdconf-enable_sendfile == ENABLE_SENDFILE_OFF) + ? 0 : APR_SENDFILE_ENABLED); +#endif + Where is pdconf ? Check out all those APR_HAS_SENDFILE. while(1) { rv = apr_file_open(fd, filename, flags, APR_FPROT_UREAD | APR_FPROT_UWRITE, r-pool); @@ -1611,150 +1623,50 @@ return APR_SUCCESS; } Looking at open_new_file, it is somewhat unreliable: if(finfo.mtime (apr_time_now() - conf-updtimeout) ) { We use APR_BUFFERED and under various circumstances files may get modified without having the mtime updated. .. +/** + * Store the body of the response in the disk cache. + * + * As the data is written to the cache, it is also written to + * the filter provided. On network write failure, the full body + * will still be cached. + */ +static apr_status_t store_body(cache_handle_t *h, ap_filter_t *f, apr_bucket_brigade *bb) { -apr_bucket *e; +apr_bucket *e, *b; +request_rec *r = f-r; apr_status_t rv; -int copy_file = FALSE; disk_cache_object_t *dobj = (disk_cache_object_t *) h-cache_obj-vobj; disk_cache_conf *conf = ap_get_module_config(r-server-module_config, disk_cache_module); dobj-store_body_called++; - + White space trash. if(r-no_cache) { ap_log_error(APLOG_MARK, APLOG_DEBUG, 0, r-server, disk_cache: store_body called for URL %s even though no_cache is set, dobj-name); file_cache_errorcleanup(dobj, r); -return APR_EGENERAL; +ap_remove_output_filter(f); +return ap_pass_brigade(f-next, bb); } if(dobj-initial_size == 0) { /* Don't waste a body cachefile on a 0 length body */ -return APR_SUCCESS; +return ap_pass_brigade(f-next, bb); } if(!dobj-skipstore dobj-fd == NULL) { rv = open_new_file(r, dobj-datafile, (dobj-fd), conf); -if(rv == CACHE_EEXIST) { +if (rv == CACHE_EEXIST) { /* Someone else beat us to storing this */ dobj-skipstore = TRUE; } -else if(rv != APR_SUCCESS) { -return rv; +else if (rv != APR_SUCCESS) { +ap_log_error(APLOG_MARK, APLOG_ERR, 0, r-server, + disk_cache: store_body tried to open cached file + for URL %s and this failed, dobj-name); +ap_remove_output_filter(f); +return ap_pass_brigade(f-next, bb); } else { dobj-file_size = 0; @@ -1762,194 +1674,227 @@ } if(dobj-skipstore) { -/* Someone else beat us to storing this object */ -if(dobj-store_body_called == 1 -dobj-initial_size 0 -APR_BUCKET_IS_EOS(APR_BRIGADE_LAST(bb)) ) -{ -/* Yay, we can replace the body with the cached instance */ -return replace_brigade_with_cache(h, r, bb); -} - -return APR_SUCCESS; +/* Someone else beat us to storing this object. + * We are too late to take advantage of this storage :( */ +ap_remove_output_filter(f); +return ap_pass_brigade(f-next, bb); } Those if( and if ( are going on my nerves! :) .. +/* start caching the brigade */ +ap_log_error(APLOG_MARK, APLOG_INFO, 0, r-server, + disk_cache: Caching body for URL %s, dobj-name); -} -else { -
Re: svn commit: r468373 - in /httpd/httpd/trunk: CHANGES modules/cache/mod_cache.c modules/cache/mod_cache.h modules/cache/mod_disk_cache.c modules/cache/mod_disk_cache.h modules/cache/mod_mem_cache.c
On Fri, Oct 27, 2006 at 11:38:02AM -0300, Davi Arnaut wrote: +/* Is our network connection still alive? + * If not, we must continue caching the file, so keep looping. + * We will return the error at the end when caching is done. + */ +if (APR_SUCCESS == dobj-frv) { + +/* insert a file bucket pointing to the cache into out temporary brigade */ +if (diskcache_brigade_insert(dobj-tmpbb, dobj-fd, dobj-file_size, + written, + dobj-updtimeout, r-pool) == NULL) { + return APR_ENOMEM; +} Err. We had the data in memory, we are going to read it back from disk again just in order to not block ? That's nonsense. Agreed. We don't need a special bucket type! Agreed. So -1 on this, on r462696, and on r450105. joe