Hi -- The short version of this email is, please, can people review this patch for server/request.c and see if it breaks anything? There are versions for trunk and 2.2.x. Thanks in advance!
http://people.apache.org/~chrisd/patches/walk_cache/ I recently finished setting up mod_dav behind mod_authn_dbd, and while we were testing I noticed that certain WebDAV clients (*ahem, DAVExplorer*) send PROPFIND requests that cause mod_dav to recurse down through the contents of the requested directory, generating sub-requests for each file or directory it encounters. Presumably such clients send a large DAV depth header value; I didn't really probe the exact request contents. What surprised me was discovering that each sub-request (or, equally, internal redirect) went through the authnz steps despite the fact that I just had a single blanket authnz configuration for the entire directory on which I'd enabled mod_dav. This meant that if a user happened to request a directory with, say, 1000 files, each file triggered a mod_authn_dbd request to the database. Clearly, this was going to drag performance into the ground if released as-was, since we have lots of really large directories. It surprised me in part because in ap_process_request_internal() there are comments and code like: /* Skip authn/authz if the parent or prior request passed the authn/authz, * and that configuration didn't change (this requires optimized _walk() * functions in map_to_storage that use the same merge results given * identical input.) If the config changes, we must re-auth. */ if (r->main && (r->main->per_dir_config == r->per_dir_config)) { r->user = r->main->user; r->ap_auth_type = r->main->ap_auth_type; } else if (r->prev && (r->prev->per_dir_config == r->per_dir_config)) { r->user = r->prev->user; r->ap_auth_type = r->prev->ap_auth_type; } And in make_sub_request(): /* Start a clean config from this subrequest's vhost. Optimization in * Location/File/Dir walks from the parent request assure that if the * config blocks of the subrequest match the parent request, no merges * will actually occur (and generally a minimal number of merges are * required, even if the parent and subrequest aren't quite identical.) */ But, as it turned out, the ap_*_walk() functions only set r->per_dir_config to r->main->per_dir_config (or r->prev->per_dir_config) if the URI or filename exactly matches. The relevant bits look like: /* If we have an cache->cached location that matches r->uri, * and the vhost's list of locations hasn't changed, we can skip * rewalking the location_walk entries. */ if (cache->cached && (cache->dir_conf_tested == sec_ent) && (strcmp(entry_uri, cache->cached) == 0)) { ... if (r->per_dir_config == cache->dir_conf_merged) { r->per_dir_config = cache->per_dir_result; return OK; } This is why all the sub-requests generated by mod_dav, where a sub-request's r->uri might be "/foo/bar.html" while the main request's r->uri was "/foo/", were each getting their own r->per_dir_config, so that ap_process_request_internal() re-ran all the authnz checks for each one. However, the code in the ap_*_walk() functions for these cases looked like it might be very close to being able to recognize them as not requiring their own privately merged r->per_dir_config. As the functions step through the config sections they detect any variation from the sections which matched for the previous or parent request; if none is found, now_merged is set to the last set of merged configs from the cache and that's then merged onto r->per_dir_config. If they could also detect that r->per_dir_config was the same as cache->dir_conf_merged -- that is, if the previous or parent request started from the same config -- then maybe they could set r->per_dir_config to cache->per_dir_result just like the case where the URI or filenames matched exactly. That's what this patch does, in sum. There's obviously one big assumption here, that it's OK to have lots of sub-requests and internal redirects pointing to the original request's r->per_dir_config, even if their URIs don't match. So far it seems to work OK for me, but there may be problems I haven't perceived or stumbled on! So, testing and review would be much appreciated. The changes to the ap_*_walk() functions are pretty clear, I think. The big change is to prep_walk_cache(). The problem I had there was that ap_process_request_internal() resets r->per_dir_config to r->server->lookup_defaults after the first call to ap_location_walk(). Here's a crummy diagram of what seems to happen to the r->dir_conf_merged and r->per_dir_result values in the cache as you pass through the initial request and then a sub-request (S = r->server->lookup_defaults): dir_conf_merged per_dir_result init: ap_location_walk: S -> A (reset) ap_directory_walk: S -> B ap_file_walk: B -> C ap_location_walk: C -> D (overwrites cache) sub: ap_location_walk: S -> W (reset) ap_directory_walk: S -> X ap_file_walk: X -> Y ap_location_walk: Y -> Z (overwrites cache) So the first call to ap_location_walk() for the sub-request sees the C -> D values, while the second sees the S -> W values. This wasn't going to work with my idea, because I needed each call to an ap_*_walk() function in the sub-request to check its r->per_dir_config against the r->dir_conf_merged from the corresponding invocation of the same function in the initial request. Thus, the patch introduces a singly-linked list of walk_cache_t structures and a count value so that prep_walk_cache() can look up the walk cache for the corresponding invocation of each function. One other small thing this patch does is reverse the checks for r->main and r->prev in a couple of places. Reviewing how sub-requests (r->main set) and internal redirects (r->prev set) are created in internal_internal_redirect() and ap_set_sub_req_protocol(), it looked to me as though r->prev is NULL for sub-requests, but r->main is r->prev->main for internal redirects. So, if one imagines a chain of them, say: init -> sub1 -> sub2 -> redir1 -> sub3 then to step backwards through the chain, you want to step back to r->prev first (if it's non-NULL) because otherwise you'll step from redir1 back to sub1, skipping sub2. (While redir1->prev == sub2, redir1->main == sub2->main == sub1. I think.) At any rate, this seems to be how log_backtrace() works, so I figured it was probably worth making request.c conform. If it breaks anything, though, it's not actually relevant to the walk cache changes and could be left as-is. So ... comments, criticisms? What's the policy on changes like this going into trunk, if I don't hear much from anyone? Thanks all, Chris. -- GPG Key ID: 366A375B GPG Key Fingerprint: 485E 5041 17E1 E2BB C263 E4DE C8E3 FA36 366A 375B
