Re: what is in modules vs what is in the core
On Mar 30, 2009, at 7:37 PM, Paul Querna wrote: mod_watchdog is the latest offender in a series of modules that expose additional functions to the API. (mod_proxy and mod_cache do too!) What happened to all functions that are not inside server/* must be either dynamic optional functions or hooks? Some modules (mostly 3rd party??) allow it either way - optional function or just linkage. I'm personally a fan of hooks and providers. (With providers, I usually just do the lookup once in, say, post-config, and cache the results in the subscribing module - this saves some hash lookups on potentially every single request.) As I hack on some lua stuff, it's useful to have the symbols for functions. That may just be because I'm lazy, because I could do optional function lookups in library opens, I suppose. OT, but I like my Lua glue in a lua module and just use require 'apache2.memcache' (or whatever) to do the linking. This works really well with per thread lua states that are all loaded at startup... (hint, hint) --Brian
Re: mod_serf: now somewhat working
On Mar 28, 2009, at 11:09 AM, Paul Querna wrote: - Much simpler configuration than mod_proxy, using location blocks (or locationMatch), rather than ProxyPass' hacking of URI stuff way earlier. It'd be nice to be able to configure and do stuff at run time using Lua. Someone had to ask... --Brian
Re: mod_dbd analogue for memcached
On Mar 6, 2009, at 1:25 AM, Kevac Marko wrote: And after that you have to restart apache? It's not enough dynamic for us. graceful restart.
Re: mod_dbd analogue for memcached
On 3/5/09 9:57 AM, Kevac Marko ma...@kevac.org wrote: The question I want to ask is should I base my module on apr_memcache or not? Is apr_memcache mature enough? Whether is is used by someone? Yes. Yes. Yes. Is this based on the existing mod_memcache? --Brian
Re: mod_dbd analogue for memcached
On Mar 5, 2009, at 4:35 PM, Kevac Marko wrote: ise. mod_memcache, if we are talking about http://code.google.com/p/modmemcache/, is too simple. Works fine for me :) I need multiple name pools to multiple servers. Something like Will you implement a default pool like mod_dbd does? This feature enables us to create dynamic, not static servers list. For high availability clusters for example. Sounds interesting, I suppose. I just generate my configs from templates and they generate the correct server list based on whatever
Re: [PATCH] mod_dbd with more than one pool
On Mar 5, 2009, at 4:52 PM, Kevac Marko wrote: 80 columns is a little bit ancient requirement in the world of 22 LCDs, but ok, i'll fix that too. Not when you have 4 code branches side by side... Also, netbooks are pretty popular as well. Most of my coding is done on a 15 laptop screen. --Brian
Re: patch for handling headers_in and headers_out as tables in mod_lua
On 2/28/09 8:37 PM, Brian McCallister bri...@skife.org wrote: It could be just: apr_hash_set(dispatch, document_root, APR_HASH_KEY_STRING, makefun(req_content_encoding_field, APL_REQ_FUNTYPE_STRING, p)); Also, couldn't we build the dispatch has once, and only once, and then just associated it with each apache2.request? This seems more efficient than building this array every time. It would also be nice to use a dispatch hash for setters as well. --Brian
Re: mod_wombat and mod_lua
On 2/27/09 2:56 PM, Brian McCallister bri...@skife.org wrote: Maybe virtually joining :-) It maybe interesting to have a virtual extension to the hackathon. IRC is good, but misses a lot of discussion I think. Some actual voice interaction would be nice. I've see you all before, so I don't really want video :P Thoughts? I'd really like to get mod_lua standardized, fast, and in 2.2, if possible (or get 2.4 out very soon). Willing to help, just let me know how. --Brian
Re: mod_wombat and mod_lua
On 2/25/09 11:23 AM, Brian McCallister bri...@skife.org wrote: It can use Brian's thread scope approach (thugh I actually want to change it to add the pool still, and backport all the rest of the trunk changes), I was thinking about this, and I think having direct access to the theead from request_rec is not needed. If we manage the array of pools from within mod_lua, then you can just handle it there. This array only needs to get created if we have thread/server scoped vm's. My horrible patch I submitted a while back did it that way. Also, does the idea of a vm_spec still make sense? I'd really like a buildable version for 2.2. FWIW, we hacked up several modules to add Lua support to them and I'd like to port these to the official mod_lua if possible, as we are using our hacked (well, now completely rewritten) version of mod_wombat. We have memcache, geoip, and some of or internal modules all with lua support now. Usually the lua glue code is less than 100 lines of C. I wish we could have a hackathon. We could knock out a bunch of this things in one sitting, I think. -- Brian Akins
Re: use of APR_SENDFILE_ENABLED in mod_disk_cache
On 2/16/09 5:06 AM, Niklas Edmundsson ni...@acc.umu.se wrote: +core_dir_config *coreconf = ap_get_module_config(r-per_dir_config, + core_module); This is a perfect example of why we need a call to hide core_module stuff from modules. We talked about this before and we are still propagating this, IMO, bad habit. --bakins
Re: [PATCH] mod_dbd with more than one pool
On 2/11/09 4:29 PM, Kevac Marko ma...@kevac.org wrote: What so you think? Patch is ready, but it needs some testing before posting. +1 I was looking to do the same thing to mod_memcache (which should be imported into trunk, IMO...)
Re: [PATCH] mod_dbd with more than one pool
On Feb 11, 2009, at 6:22 PM, Graham Leggett wrote: Would it make sense for mod_memcache to become a provider beneath mod_socache, or am I missing something? mod_memcache really just provides the config glue for apr_memcache so that every module that wants to use apr_memcache doesn't have to write all the config glue themselves. Yeah, it could probably add some socache hooks as well. I'm adding some Lua glue right now. mod_memcache is already Apache licensed.
Re: [mod_lua] vm management
On 2/6/09 8:09 AM, Bertrand Mansion bmans...@mamasam.net wrote: What do you call slow ? Do you have benchmarks ? Some round numbers: No Lua at all: 40k/sec Our hack version (based on older mod_wombat): 34k/sec New mod_lua: 20k Most of this it seems is from the lua states being created for every single request pool. This is some lua that runs on every single request. Do you have an example of applications that could benefit from this ? Sure. We have some lua that runs on (almost) every single request. We do a fair bit of requests a second. The build up and tear down of states - while cheaper than most scripting languages - is just unnecessary. We are careful about variable scoping, etc. Isn't this a way to shoot yourself in the foot ? We already have several custom C modules, so we are more than comfortable with load once, run many times. Slightly OT: I had to stop using my work email for the list bcs the mailing list mail servers don't like our corporate mail server. So, it looks like I'm just another hack. A few folks can vouch for what I do with apache in my day job...
Re: [mod_lua] vm management
On 2/6/09 12:17 AM, Brian McCallister bri...@skife.org wrote: * One entry point to obtain VMs This is the apl_get_lua_state(..) function. It is passed the information required to either find, or create the needed lua_State. This is: - the lifecycle pool to which it is bound - the file name to define stuff in the lua_State - package load paths for the lua_State - package load paths for lua cmodules - a callback and baton to be invoked if the lua_State is created Do we really need paths and cpaths to be configurable in any way besides globally? In individual scripts, you can add more paths. Also, we already have the lua_open hook as well as lua_request hook, so the callback doesn't really need to be there either. So, I'd propose that we change it to this: apl_get_lua_state(apr_pool_t *pool, const char *file, const char *data) Pool is the lifecycle pool to which it is bound. File is , usually, the file name to define stuff in the lua_State. Data is a raw string to use (rather than a file) and if present, then file is used as an identifier. The whole spec idea, then is only for internal mod_lua use, not for general consumption. Thoughts?
Re: [mod_lua] vm management
On 2/6/09 9:40 AM, Bertrand Mansion bmans...@mamasam.net wrote: I remember having met a lot of problems with the old mod_wombat due to persistent states, even with cache set to never. In the end, I had to restart the server each time I modified a lua source file. It seems that you suggest to reintroduce features that would make this happen again. Actually, that is exactly what I want. I don't want it to be by default, but I want it to be available to those who need it. The overhead to creating a lua state is very low. Most of the time is spent parsing the file (compiling it if you will). If you need to handle several thousand requests per second, however, even this small amount of overhead is too much. I'd like to have the option - without maintaining a hacked version - to do have persistent per thread lua states. --bakins
Re: [mod_lua] vm management
On Feb 6, 2009, at 12:17 AM, Brian McCallister wrote: * VMs are attached to apr pools. +1 * Concurrency is up to the client What are you defining as the client? We should not expose configuration options which will create concurrency issues, such as attaching a VM to the server_rec pool. It is very possible for someone to programmatically use the module to do things like that, but if they do any locking or resource pooling is up to them. Sure, bcs attaching to server_rec pool is silly. However, we do, IMO, need a way to have some other way besides r-pool in stock mod_lua. This is just too slow for a lot of things. * One entry point to obtain VMs +1. I personally do not like exposing tons of struct info, but I'm -0 on that. We want to be able to get back to stock mod_lua, but it's just too darned slow right now :( Having some type of per-allocation and long lived LuaStates helps this allot. All the C stuff is done that way (in memory once, ran many times) and I'd like to be able to do the Lua stuff the same way without having to bolt on yet another bakins- specific module. OT: toying with lua server pages :)
Mod_lua per thread states/scopes
I'm hacking on getting the former work I did for mod_wombat to have per-thread lua states/scopes into mod_lua. I like the new pool approach. My main reason for this is we have some code that runs on almost every request that we are careful about variable scope, etc that the overhead of creating a new state every time would just kill performance. Hopefully have some code in next couple of days...
Re: Mod_lua per thread states/scopes
Here's a patch just to give you an idea of what I'm thinking. It compiled. This is more to get some ideas going to see how/iff this fits into mod_lua. The biggest issue is that the package_paths are per_dir and in post_config, you can't really get at those in a useful way. I precompile all of the server scoped handlers in post_config so that there is no worry about that once we start taking requests. This just makes an array of pools sized at thread_limit, so that basically each thread has it's own pool. I use the connection-id to determine what thread a request is running in, bcs there is no real way to get at that. I used worker here, so this may not work anywhere else as it's tied to how worked calculated the connection-id. Since the pool is per thread, we don't have to worry about any locking (in worker at least). --bakins mod_lua-server-scope.diff Description: Binary data
Re: patch for handling headers_in and headers_out as tables in mod_lua
Is this in trunk? I don't see it, but I've been known to overlook stuff. On 1/26/09 1:45 PM, Brian McCallister bri...@skife.org wrote: For anyone following, it has been applied :-) -- NEW function handle(r) local host = r.headers_in['host'] r:puts(host) -- and can now modify them r.headers_in['X-XX-Fake'] = 'rabbits!' r.headers_out['wombat'] = 'lua now!' end Wasn't this the way it used to be? --bakins
Re: patch for handling headers_in and headers_out as tables in mod_lua
On 2/5/09 1:51 PM, Brian McCallister bri...@skife.org wrote: Yep, Paul changed the internal impl to be less gross, but in doing so changed the API, i changed the impl to be not gross and restored old API. Okay I see it now. I may take a crack at a little performance tuning. Setting up the dispatch table every single time doesn't seem necessary. We should be able to just do that once at httpd start time. So, if I'm reading that correctly, this should work now as well?? r.content_type And we would just hack up req_newindex to be able to do r.content_type = application/bakins I like the dispatch idea. I'll think about the same for newindex as well...
Re: 3.0 - Introduction
Make everything possible into a hook or use the provider model. Simple example: the way we determine if a connection can be kept alive is a monolithic function. This should be a hook. Disk I/O (read/write/seek, etc.) could be abstracted by providers, for example. Maybe we need full blown VFS?? -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: 3.0 - Proposed Goals
Jim Jagielski wrote: This makes a lot of sense, but please NOT AJP... It seems to be that staying with HTTP is the most scalable, easiest to debug and troubleshoot, and the most straightforward. Would be nice if we could do HTTP over unix domain sockets, for example. No need for full TCP stack just to pass things back and forth between Apache and back-end processes. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
mod_memcache??
I have a need to write a generic way to integrate apr_memcache into httpd. Basically, I have several otehr modules taht use memcached as backend and want to combine the boring stuff into a central place, ie configuration, stats, etc. We talked a little on list about this a few months ago, but noone ever did anything. Is anyone else interested in this? Has anyone did this? Basically I was thinking there would be a single funtion: apr_status_t ap_memcache_client(apr_memcache_t **mc) which would simply give the user an client to use with normal apr_memcache functions. The module could create the underlying mc at post_config. Basically, mod_memcache could have this config: MemCacheServer memcache1.turner.com:9020 min=8 smax=16 max=64 ttl=5 MemCacheServer memcache4.turner.com:9020 min=8 smax=16 max=64 ttl=5 MemCacheServer memcache10.turner.com:9020 min=8 smax=16 max=64 ttl=5 or whatever. This would end the config duplication between various modules. This module could also add memcache stats to /server-status Comments? -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: mod_cache+mod_rewrite behaviour
Niklas Edmundsson wrote: Since mod_cache runs as a quick handler, matching based on URL would probably be the easiest since you don't have the mime type info then. Maybe something like CacheEnable disk /special/path ignore_query Could add other options if future as well. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: httpd-proxy-scoreboard how to go on
Jim Jagielski wrote: I thought the whole idea was to abstract out the scoreboard so that it was easier for people to add and remove tables from the scoreboard... the so-called generic scoreboard. I donated the mod_slotmem code a few weeks ago that was a rather simple way to do just that. Needs some work, but is at least a startr. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: Wrong etag sent with mod_deflate
Henrik Nordstrom wrote: But the unique identity of the response entity is defined by request-URI + ETag and/or Content-Location. The cache is not supposed to evaluate Accept-* headers in determining the entity identity, only the origin server. However, on an initial request (ie, non-conditional) we do not have an etag from the client, we only have info like Host, URI, Accept-*, etc. So, how would the cache identify which entity to serve in this case? Please see RFC2616 13.6 Caching Negotiated Responses, it explains how the RFC intends that caches should operate wrt Vary, ETag and Content-Location in full detail. I have read it many times.. In our case - cnn.com, etc. - we have to decided to be RFC compliant from the client to the cache server. From the cache to the origin, however, we are not as concerned. In a reverse-proxy-cache, this is not a big deal. However, in a normal forward-proxy-cache, where one does not control both cache and origin, one must be more careful. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: Wrong etag sent with mod_deflate
Henrik Nordstrom wrote: mån 2006-12-11 klockan 14:25 -0500 skrev Brian Akins: So, multiple variants of the same object can have the same Etag, but still be different cached objects. Your implementation ignores RFC 2616 13.6 Caching Negotiated Responses, but is otherwise fine. It's functionally compliant but not as effective as it could be. That was a simplified explanation, we actually do not store a cache entry for every single variant. In our case the only thing we actually ever care about is whether or not you support gzip. So all the variants for Vary: User-Agent, Accept-Encoding actually boil down to 2 variants - gzip or no-gzip. One of the major reasons we quit using squid was it support for Vary's. (This was pre-3.0, so things may have changed). Of course, at the time httpd wasn't any better - but it was alot easier to hack ;) Variants is identified by ETag or Content-Location. Only if there is neither ETag or Content-Location in the response entity then is the response entity identified by the Vary request headers. Only conditional requests from clients, generally, have If-None-Match headers. So the only way for a cache, on an initial request from a client, to determine what object to serve is to use the Client supplied information - which doesn't include an Etag, so you have to, usually, rely on URI first, and then the Vary information. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: Wrong etag sent with mod_deflate
This is not a response to any post on this subject, but more of a comment. Here is a real world example of how we use deflate and etags with our cache. (Note this is very similar to mod_cache, but I do not know the inner workings of it as well). 1. Generate key from URI and ap_get_servername 2. open cached object. Is it Vary? no, goto step 5. 3. Is Vary. Generate new key. 4. Open cached object. 5. Check expiry time, exit if expired. 6. Load headers. 7. Call ap_meets_conditions (etags, IMS, etc.) If yes, return 304 (or whatever). 8. If not meets_conditions, serve from cache. So, multiple variants of the same object can have the same Etag, but still be different cached objects. This probably has no bearing on the current conversation, but perhaps I am not fully appreciating the core of the debate?? -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
[PATCH} MaxKeepAliveConnections in http module
Rather ugly, but in the http module rather than mpm. Still needs some work but works in most cases here. I think there are instances where an aborted connection will not decrement the count, maybe. Had to add configuration stuff to http_core, since it, wrongly in my opinion, uses the core server config structures. Patch is against 2.2.3, since that's what we run mostly around here. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies diff -ur httpd-2.2.3/include/http_core.h httpd-2.2.3-keepalive/include/http_core.h --- httpd-2.2.3/include/http_core.h 2006-07-11 23:38:44.0 -0400 +++ httpd-2.2.3-keepalive/include/http_core.h 2006-11-15 15:26:14.0 -0500 @@ -682,6 +682,8 @@ /* -- */ +AP_DECLARE(int) ap_can_keepalive(request_rec *r); + #ifdef __cplusplus } #endif diff -ur httpd-2.2.3/modules/http/http_core.c httpd-2.2.3-keepalive/modules/http/http_core.c --- httpd-2.2.3/modules/http/http_core.c2006-07-24 09:34:19.0 -0400 +++ httpd-2.2.3-keepalive/modules/http/http_core.c 2006-11-15 17:20:25.0 -0500 @@ -35,6 +35,8 @@ #include mod_core.h +module AP_MODULE_DECLARE_DATA http_module; + /* Handles for core filters */ AP_DECLARE_DATA ap_filter_rec_t *ap_http_input_filter_handle; AP_DECLARE_DATA ap_filter_rec_t *ap_http_header_filter_handle; @@ -42,6 +44,28 @@ AP_DECLARE_DATA ap_filter_rec_t *ap_http_outerror_filter_handle; AP_DECLARE_DATA ap_filter_rec_t *ap_byterange_filter_handle; + +typedef struct { +int maxkaconn; +} http_core_conf_t; + +static apr_uint32_t current_ka = 0; + +static void *create_http_conf(apr_pool_t * p, server_rec *s) +{ +http_core_conf_t *conf = apr_pcalloc(p, sizeof(http_core_conf_t)); +return conf; +} + +static const char *set_maxkaconn(cmd_parms *cmd, void *dummy, + const char *arg) +{ +http_core_conf_t *conf += ap_get_module_config(cmd-server-module_config, http_module); +conf-maxkaconn = atoi(arg); +return NULL; +} + static const char *set_keep_alive_timeout(cmd_parms *cmd, void *dummy, const char *arg) { @@ -94,6 +118,9 @@ or 0 for infinite), AP_INIT_TAKE1(KeepAlive, set_keep_alive, NULL, RSRC_CONF, Whether persistent connections should be On or Off), +AP_INIT_TAKE1(MaxKeepAliveConnections, set_maxkaconn, NULL, GLOBAL_ONLY, + Maximum number of Keep-Alive connections per child, + or 0 for infinite), { NULL } }; @@ -206,9 +233,43 @@ return OK; } +AP_DECLARE(int) ap_can_keepalive(request_rec *r) { +http_core_conf_t *conf += ap_get_module_config(r-server-module_config, http_module); +apr_uint32_t current; + +if(conf-maxkaconn) { +current = apr_atomic_read32(current_ka); + +if(current = conf-maxkaconn) { +return 0; +} +} + +return 1; +} + +static apr_status_t keepalive_cleanup(void *data) +{ +conn_rec *c = (conn_rec *)data; + +/*need to check for abort?? Also, what happens when client closes and we are in keepalive? + * should we register a cleanup on connection pool? + */ +if(c-keepalive == AP_CONN_KEEPALIVE) { +apr_atomic_inc32(current_ka); +} + +return APR_SUCCESS; +} + static int http_create_request(request_rec *r) { if (!r-main !r-prev) { +conn_rec *c = r-connection; + http_core_conf_t *conf + = ap_get_module_config(r-server-module_config, http_module); + ap_add_output_filter_handle(ap_byterange_filter_handle, NULL, r, r-connection); ap_add_output_filter_handle(ap_content_length_filter_handle, @@ -217,6 +278,15 @@ NULL, r, r-connection); ap_add_output_filter_handle(ap_http_outerror_filter_handle, NULL, r, r-connection); + +if(conf-maxkaconn) { +/*this connection had been kept alive, but it's now active again*/ +if(c-keepalive == AP_CONN_KEEPALIVE) { +apr_atomic_dec32(current_ka); +} +apr_pool_cleanup_register(r-pool, c, keepalive_cleanup, + keepalive_cleanup); +} } return OK; @@ -265,7 +335,7 @@ STANDARD20_MODULE_STUFF, NULL, /* create per-directory config structure */ NULL, /* merge per-directory config structures */ -NULL, /* create per-server config structure */ +create_http_conf, /* create per-server config structure */ NULL, /* merge per-server config structures */ http_cmds, /* command apr_table_t */ register_hooks /* register hooks */ diff -ur httpd-2.2.3/modules/http/http_protocol.c
non-blocking file buckets, cor output, and 2.2.3
In reference to some mod_cache discussions: It seems, that after some testing, that in 2.2.3, the core output filters will block when given file buckets, therefore, stalling the entire brigade (ie, slowing reads from proxy, cgi, etc.). This was a somewhat artificial test I did, but can someone confirm if something changed in trunk that allows file buckets to be handled differently... -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: non-blocking file buckets, core output, and 2.2.3
Brian Akins wrote: This was a somewhat artificial test I did, but can someone confirm if something changed in trunk that allows file buckets to be handled differently... Actually got of my rear and looked at it myself :) From the looks of it, core_output_filter in 2.2.3 does not have any special handling of file buckets and is in many ways different from trunk. FWIW. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: mod_cache summary and plan
Davi Arnaut wrote: The solution consists of using the cache file as a output buffer by splitting the buckets into smaller chunks and writing then to disk. Once written (apr_file_write_full) a new file bucket is created with offset and size of the just written buffer. The old bucket is deleted. Without having looked very much at the code, this approach sounds feasible. I'm still confused as to why we need the temporary brigade??? Why not swap the buckets? -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: mod_disk_cache summarization
Niklas Edmundsson wrote: The comparison of your and Brian's experience are two ends of extremes on high volume caches, one low hits large files, the second high hits small files. This should make for some useful tuning information. The extreme difference is what makes me think that we should acknowledge that they exist and provide the relevant knobs where necessary. As it looks right now, those knobs tend to be more OS/filesystem specific, but that might change as this evolves. My thought on this is that we use providers, so in theory, you could use a different provider for the different types: CacheEnable /largecrap large_disk_with_stat_sleep_thing CacheEnable /normalstuff normal_disk Perhaps the only difference between these two is the CACHE_IN mechanism (ie, serving from cache is the same). There is no reason, IMNSHO, to try to shove all the functionality under the sun into one cache provider. The current mod_disk_cache (before the mass patches) works for a large percentage of cases. I see no reason to mutilate it. Just do a new provider... -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: mod_disk_cache summarization
Plüm wrote: Agreed. If it turns out that the common code base between both cases is only small and it is complex to do both things in one provider just make two providers out of them. The remaining common code could be factored out in a separate disk_cache_util c file which is used by both providers. It may be possible to even decide which provider to use based on various factors: object size (content-length), time of day, phase of moon, cpu usage, disk io, cache size etc. Could be as simple as shoving the provider in an env like cache-provider = brians_wacky_memcache and mod_cache tries that one first. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: [PATCH] add MaxKeepAliveConns to event mpm
Brian Akins wrote: Allows you to limit number of keepalive connections per child. Patched against 2.2.3 as trunk just seems broken right now. Did some light testing, seems to work. After some looking, this really belongs in the http module. However, it is so intertwined with core server stuff, that it may be a little tricky to do... Maybe I'll just submit a mod_max_keepalive... -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
mod_disk_cache summarization
Can someone please summarize the various patches for mod_disk_cache that have been floating around in last couple weeks? I have looked at the patches but wasn't real sure of the general philosophy/methodology to them. Others may find it useful as well -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: mod_offline
Graham Leggett wrote: - Cache-Control and Pragma headers must be stripped from requests into the cache. Stripped, or ignored. Think CacheIgnoreCacheControl. We can by configuration control on a per virtual basis. We also can ignore query strings in same way. - mod_cache must, if the backend response is 5xx, deliver the latest cached data the server has present, regardless of whether the cached entry is stale or not. Yep. Is there anything else it would need to do? There is already a hook in mod_proxy that gets called when all origins are down (cache_request_status or something). Just need an optional function in mod_cache, that says serve and don't check expires. Also, don't delete expired data inside mod_cache. htcache daemon should probably not delete expired stuff right away. Expired objects should get a grace period of some configurable length. Make sense? -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: mod_offline
Issac Goldstand wrote: - Cache-Control and Pragma headers must be stripped from requests into the cache. Why? Just because you are in offline mode doesn't mean other proxies between you and the client (or the client itself) are. Graham is talking about headers coming from client to the cache server, I fI read this correctly. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: httpd 2.2 cache - disable and enable
wrote: Wouldn't it be easier to do a match on mime type, like ExpiresByType? we do not know the mime type in quick_handler. In theory, you could disable/enable the CACHE_SAVE filter by mime type, but that seems a little messy because we would check in quick_handler to see if its cached. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: httpd 2.2 cache - disable and enable
Matthieu Estrade wrote: IMHO, a regexp based cache enable or disable could be very usefull for a default caching policy shipped with httpd. We could do per default caching only on all images, css and all static content. Some random thoughts: Personally, I think the cache rules matching methods should be provider based. ie, something like: CacheEnable disk /images prefix #prefix could be default CacheEnable disk *.gif$ regex CacheEnable mem cache_me match #use strmatch stuff, faster than regex CacheEnable disk - all #matches every thing. A little faster than prefix / CacheDisable ftp protocol #enable/disable based on protocol CacheDisable 1.2.3.4 host #don't cache things for this client Of course, we already do almost all that stuff with everything else, but cache is quick handler, so as Colm noted, location, locationmatch, etc don't work. Unfortunately, if you move it to normal handler, you lose a bit of performance. We cache alot of stuff to avoid the mangled mess of rewrite rules that would run if mod_cache was not a quick handler on every single request. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: Coding style
Garrett Rooney wrote: Or the even more readable: rv = do_something(args); if (rv == APR_SUCCESS) { } yuck! Think of all the harmless newlines you are senselessly wasting. Our children will have to code with no newlines if we do not conserve them today. Won't someone please think of the children. Seriously, not a fan of that style. Doesn't matter that much to me, however. -0 -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
[Fwd: Re: [PATCH] setenvif filter]
Bringing this up. again. Adds a filter that allows mod_setenvif to act on response headers. Original Message Subject: Re: [PATCH] setenvif filter Date: Wed, 31 May 2006 17:24:33 +0200 From: Francois Pesce [EMAIL PROTECTED] Reply-To: dev@httpd.apache.org To: dev@httpd.apache.org References: [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] These patches may fix the r-content_type behaviour. Are you OK with it ? -- *Francois Pesce* 2006/5/31, Brian Akins [EMAIL PROTECTED]: Francois PESCE wrote: I've discussed about a patch for mod_setenvif 2 years ago, and have coded it at that time, it is successfully used on various host in production since. You need to handle content type specially by checking r-content_type. For some reason, just doing apr_table_get(r-headers_out, Content-type) would be null, but content_type would be set. See the patch I posted a few days ago. +1 in concept -- Brian Akins Lead Systems Engineer CNN Internet Technologies -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies mod_setenvif-2-2-x-2.patch Description: Binary data mod_setenvif-2-0-x-2.patch Description: Binary data
Re: Regexp-based rewriting for mod_headers?
+1 on the patch -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: [patch 09/16] simplify array and table serialization
Davi Arnaut wrote: Simplify the array and table serialization code, separating it from the underlying I/O operations. Probably faster to just put every thing in an iovec (think writev). -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: mod_cache responsibilities vs mod_xxx_cache provider responsibilities
Niklas Edmundsson wrote: don't care about performance... Actually, cache on xfs mounted with atime doesn't seem to be a performance killer oddly enough... Our frontends had no problems surviving 1k requests/s during the latest mozilla-update-barrage. 1k requests/second is not really that much... 10k requests/second is more what I'm used to. XFS sucks for us as a cache storage. It tends to crock under some traffic patterns (reads vs writes). ext3 is actually more reliable for us. Reiserfs is interesting, but tends to go haywire from time to time. We clean our cache often because we have a really quick way to find the size and remove the oldest expired objects first. Every cache store gets recorded in SQLite with info about the object (size, mtime, expire time, url, key, etc.). Makes it trivial tow write cron jobs to do cache management. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: [patch 09/16] simplify array and table serialization
Davi Arnaut wrote: On 20/09/2006, at 10:16, Brian Akins wrote: Davi Arnaut wrote: Simplify the array and table serialization code, separating it from the underlying I/O operations. Probably faster to just put every thing in an iovec (think writev). Probably no, apr_brigade_writev does (quite) the same. Doesn't mean apr_brigade_writev does it fast either... If the serialization simply returned an iovec, mod_mem_cache could use apr_pstrcatv and mod_disk_cache could use apr_file_writev. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: mod_cache responsibilities vs mod_xxx_cache provider responsibilities
Issac Goldstand wrote: I don't understand why bother getting so complex. Touch/truncate the body file when storing the header, and then a missing body means things have gone amok - retry the request. Conversely, a zero-length, or C-L body length means another thread is working on the body. unless 0 is a valid content-length, which it can be. Also, what about when we are reading something in without a know C-L, for example from an origin doing chunks? You're right, this is a tricky one, but there is a solution out there. Maybe we're attacking the problem from the wrong angle. Rather than modifying mod_cache, modify the garbage-collector (e.g., htcacheclean). Do a two pass cleanup. I think it's insane that it has to traverse the directory structure to do find the objects. There should be an index of objects. Traversing the tree can be a huge hit on large, busy structures. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: mod_slotmem
Oden Eriksson wrote: onsdag 30 augusti 2006 10:37 skrev Brian Akins: With all the talk of a generic scoreboard, here's something I whipped up that allows any other module to have some amount of memory per worker slot. We have a different module in-house at CNN which does something similar. This one is a little rough around the edges, but gives an idea of what I was thinking about doing. Care to release this and with a license? I guess the ASF can have it and use Apache license. It is a little rough, but maybe some will find it useful. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: mod_cache responsibilities vs mod_xxx_cache provider responsibilities
Niklas Edmundsson wrote: Extra tracking sounds unnecessary if you can do it in a way that doesn't need it. It's not extra it just adding some tracking. When an objects gets cached log (sql, db, whatever) that /blah/foo/bar.html is cached as /cache/x/y/something.meta. Then it's very easy to ask the store what is /blah/foo/bar.html cached as? There may be multiples because of vary. * Clients read from the cache as files are being cached. That's the hard one, IMO * Only one session caches the same file. Easy to do if we use deterministic tmp files and not the way we currently do it. Then all you have to do is when creating temp files use O_EXCL. * Header/Body updates. Eaiser with seperate files like mod_disk_cache does now. * No index/files out-of-sync issues. Ever. Hard to guarantee, but not impossible. Always to index when storing file and remove when deleting. This should use something like providers so it's not in core cache code and can be easily modified. With locks, yes it's possible but also a hassle to get right with performance intact. Not really that hard. Trust me it has been done... We, as a ftp mirror operated by a non-profit computer club, have a slightly different usecase with single files larger than machine RAM and a working set of approx 40 times larger than RAM. Some bad design decisions in mod_disk_cache becomes really visible in this environment. Seems to me you should approach problem differently, like rsyncing the mirrored content. I don't know your environment, but was just what I cam up with off the top of my head. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: mod_cache responsibilities vs mod_xxx_cache provider responsibilities
Graham Leggett wrote: I have not seen inside the htcacheclean code, why is the code reading the headers? In theory the cache should be purged based on last access time, deleted as space is needed. Everyone should be mounting cache directories noatime, unless they don't care about performance... Your patch is battle tested, and fixes some specific problems, the only issue that I think needs to be resolved is the question of whether single file or multiple files are preferable, taking into account performance on platforms other that Linux as well. I'm very interested in this as well. Very good ideas that just need a little refinement. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: mod_cache responsibilities vs mod_xxx_cache provider responsibilities
Issac Goldstand wrote: I can see how other tracking information (like how often the cached entity is accessed, last access time, etc) would be useful, Also, those statistics could be updated asynchronously by using a queue so that statistics doesn't slow down a busy web server. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: mod_cache responsibilities vs mod_xxx_cache provider responsibilities
Niklas Edmundsson wrote: Will it be possible to do away with one file for headers and one file for body in mod_disk_cache with this scheme? The thing is that I've been pounding seriously at mod_disk_cache to make it able to sustain rather heavy load on not-so-heavy equipment, and part of that effort was to wrap headers and body into one file for mainly the following purposes: The separate header and body files work wonderfully for performance (filling multiple gig interfaces and/or 30k requests/sec. or rather modest hardware). If you have them all in one, it can make the sendfile for the body cumbersome. If you somehow track what entries or in the cache, it is very easy to purge entries. At Apachecon, I'll talk some about our version of mod_cache. Unfortunately, I can't share code :( But I can tell you the separate files way is not a performance or housekeeping issue. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: mod_cache responsibilities vs mod_xxx_cache provider responsibilities
Niklas Edmundsson wrote: If I remember correctly the code in 2.2.3 only does whole-file revalidation, No, it can have a stale handle that it makes fresh if it gets a 304. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: svn commit: r440337 - in /httpd/httpd/trunk: ./ include/ modules/arch/netware/ modules/experimental/ modules/generators/ modules/http/ modules/mappers/ modules/proxy/ modules/ssl/ server/ server/m
Ruediger Pluem wrote: 1. If we stick to AP_DECLARE(const char *) ap_get_server_version(void); and do #define ap_get_server_banner ap_get_server_version I hate macros. Just do it like: AP_DECLARE(const char *) ap_get_server_banner() { return ap_get_server_version(); } That way, it gets a symbol rather than disappearing after compile. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
porstfs bug t2000 and 2.2
We tested a Sun t2000 with httpd 2.2. It did okay. Now, Sun says there is an issue with 2.2 and portfs on Solaris 10 on the t2000. Not real sure what this means. Anyone else heard this? -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
mod_slotmem
With all the talk of a generic scoreboard, here's something I whipped up that allows any other module to have some amount of memory per worker slot. We have a different module in-house at CNN which does something similar. This one is a little rough around the edges, but gives an idea of what I was thinking about doing. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies #ifndef __MOD_slotmem__ #define __MOD_slotmem__ typedef struct ap_slotmem_t ap_slotmem_t; typedef apr_status_t ap_slotmem_callback_fn_t(void* mem, void *data, apr_pool_t *pool); AP_DECLARE(apr_status_t)ap_slotmem_do(ap_slotmem_t *s, ap_slotmem_callback_fn_t *func, void *data, apr_pool_t *pool); AP_DECLARE(apr_status_t) ap_slotmem_create(ap_slotmem_t **new, const char *name, apr_size_t item_size, apr_pool_t *pool); AP_DECLARE(apr_status_t) ap_slotmem_mem(ap_slotmem_t *s, conn_rec *c, void**mem); #endif #include httpd.h #include http_config.h #include http_protocol.h #include http_connection.h #include ap_config.h #include http_log.h #include scoreboard.h #include apr_strings.h #include apr_shm.h #include ap_mpm.h #include sys/types.h #include unistd.h #include mod_status.h #include mod_slotmem.h module AP_MODULE_DECLARE_DATA slotmem_module; static int server_limit = 0; static int thread_limit = 0; static int total_limit = 0; static apr_array_header_t *slotmem_array = NULL; struct ap_slotmem_t { apr_pool_t *pool; const char *name; apr_shm_t *shm; void *mem; apr_size_t item_size; /*size of each item*/ apr_size_t total_size; }; AP_DECLARE(apr_status_t)ap_slotmem_do(ap_slotmem_t *s, ap_slotmem_callback_fn_t *func, void *data, apr_pool_t *pool) { apr_status_t rv = APR_SUCCESS; int i; void *mem; for(i = 0; i total_limit; i++) { mem = s-mem + (i * s-item_size); if((rv = func(mem, data, pool)) != APR_SUCCESS) { return rv; } } return rv; } AP_DECLARE(apr_status_t) ap_slotmem_create(ap_slotmem_t **n, const char *name, apr_size_t item_size, apr_pool_t *pool) { ap_slotmem_t *s, **new; s = apr_pcalloc(pool, sizeof(ap_slotmem_t)); s-pool = pool; s-name = apr_pstrdup(pool, name); s-item_size = item_size; new = (ap_slotmem_t **) apr_array_push(slotmem_array); (*new) = (ap_slotmem_t *) s; *n = s; return APR_SUCCESS; } AP_DECLARE(apr_status_t) ap_slotmem_mem(ap_slotmem_t *s, conn_rec *c, void **mem) { /*this should work for all mpm's*/ void *d = s-mem + (c-id * s-item_size); *mem = d; if(!d) { ap_log_cerror(APLOG_MARK, APLOG_ERR, 0, c, ap_slotmem_mem: d is NULL, %ld, %p, c-id, s-mem ); return APR_EGENERAL; } return APR_SUCCESS; } static int slotmem_status(request_rec *r, int flags) { int i; ap_slotmem_t *sb, **list; if (!(flags AP_STATUS_SHORT)) { ap_rputs(hr /bslotmems/b\n, r); ap_rputs(table border=1trtdbName/b/tdtdbItem Size/b/tdtdbTotal Size/b/td/tr\n, r); } list = (ap_slotmem_t **)slotmem_array-elts; for(i = 0; i slotmem_array-nelts; i++) { sb = list[i]; if (!(flags AP_STATUS_SHORT)) { ap_rprintf(r, trtd%s/tdtd%APR_SIZE_T_FMT/tdtd%APR_SIZE_T_FMT/td/tr\n, sb-name, sb-item_size, sb-total_size); } else { ap_rprintf(r, slotmem: %s %APR_SIZE_T_FMT%APR_SIZE_T_FMT\n, sb-name, sb-item_size, sb-total_size); } } if (!(flags AP_STATUS_SHORT)) { ap_rputs(/table\n, r); } return OK; } static int post_config(apr_pool_t *p, apr_pool_t * plog, apr_pool_t * ptemp, server_rec *s) { apr_size_t amt; apr_time_t now; pid_t pid; char *file; ap_slotmem_t *sb, **list; int i; apr_status_t rv; const char *temp_dir; char *data = NULL; /*have we been here before? */ apr_pool_userdata_get((void *) data, __FILE__, s-process-pool); if (!data) { apr_pool_userdata_set((const void *) 1, __FILE__, apr_pool_cleanup_null, s-process-pool); return OK; } ap_mpm_query(AP_MPMQ_HARD_LIMIT_THREADS, thread_limit); ap_mpm_query(AP_MPMQ_HARD_LIMIT_DAEMONS, server_limit); total_limit = thread_limit * server_limit; now = apr_time_now(); pid = getpid(); if ((rv = apr_temp_dir_get(temp_dir, p)) != APR_SUCCESS) { ap_log_error(APLOG_MARK, APLOG_STARTUP, 0, s, apr_temp_dir_get failed); return rv; } /*XXX: currently, this uses one shm per slotmem. We could try to pack them all into a few larger shm's*/ list = (ap_slotmem_t **)slotmem_array-elts; for(i = 0; i slotmem_array-nelts; i++) { sb = list[i]; sb-total_size = amt = sb-item_size
Re: mod_slotmem
Jean-frederic Clere wrote: Nice stuff but I am not sure that having shared memory per slot scales when having a lot of entries, but that makes sure that one process/thread won't overwrite another one slot. It scales very nicely. We run with max clients set between 16k-32k with no issues. Our item sizes range from 64-264 bytes. at 264 bytes and 24k max clients, it requires about 6.5 MB of shared memory and we usually have 8-12 slotmems of varying sizes, so we use about 32-64 MB of shared memory for all the slotmems. It is very fast as there is no locking necessary as each slot is only every written to by a single writer (it's tied to a connection id). -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: Memory usage in apache
Ian Holsman wrote: Hi. now that I am a 'founder' of my own and using shared hosting, and no longer have millions of machines to play with I am starting to notice how much memory apache is using. We generally have 4GB of ram on our webservers and let Apache use almost every bit of it... Could just go the virtual server route ala Zen or Virtuozzo. Lots of fairly reasonable plans out there. so.. I was wondering if anybody has looked at a lightweight apache, or other ways to reduce it's memory footprint. Once you load something like mod_python or mod_perl, it's not really Apache memory that's your problem. Apache can be slimmed down to 8-32 MB (depending on modules and config) fairly easily just by removing unwanted cruft. Most mod_xxx programming stuff tend to cache alot of stuff in RAM. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
cache and proxy
If I am reading the code correctly, we only do not set Server header when r-proxyreq != PROXYREQ_NONE. So on a cached response, even if original was reverse proxy, we set our (Apache) server header?? -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: svn commit: r426604 - in /httpd/httpd/branches/httpd-proxy-scoreboard: modules/proxy/ support/
Jim Jagielski wrote: I thought that this was about abstracting out scoreboard so that other modules could have scoreboard-like access without mucking around with the real scoreboard... +1. The proxy could just use this mechanism. We need to separate the two issues. I am all in favor of a generic scoreboard, that, in the future, the real scoreboard might use. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: load balancer cluster set
Guy Hulbert wrote: However, you may not be able to wait until the linux router project picks this up (but it might be worth looking to see what is available). Most of the load-balancing we are discussing on this list is not for directly customer facing applications. These are proxies for application servers generally, but they need to be highly available. We are not trying to replace Cisco CSM's. But a hardware HTTP-only aware $20k device is not needed when I just need to load balance an app across 4 tomcat instances, for example. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Scoreboard was Re: load balancer cluster set
I've seen all the traffic on the scoreboard and this is very useful context ... Also, I am using a similar scoreboard mechanism to collect lots of per worker stats without the extendedstatus overhead. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: load balancer cluster set
Guy Hulbert wrote: That's the ultimate case, after all :-) Not necessarily. Google's answer is to throw tons of hardware at stuff. Which is great if you have unlimited space, power, and cooling. Some other sites do some rather interesting things with a relatively small number of servers -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: load balancer cluster set
Guy Hulbert wrote: The point of contention was scalability ... from a human point of view it is really annoying to have to solve a problem twice but from the business pov, outgrowing your load balancer might only be a good thing. Yes. But most load balancer can only do layer 7 load balancing. Sometimes it is necessary to have very application specific routing. Also, in general, most hardware load balancers base their algorithms on things such as response time. Sometimes, it is necessary to know the general health of the backend servers. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: svn commit: r423444 - in /httpd/httpd/branches/httpd-proxy-scoreboard/modules/mem: ./ Makefile.in config5.m4 mod_plainmem.c mod_scoreboard.c mod_sharedmem.c slotmem.h
Jean-frederic Clere wrote: Do you mean the proxy back-end connections? I am thinking of a more general purpose slotmem not particularly tied to proxy. Maybe have some wrapper functions that create a slotmem based on threads x procs and can be access using r-connection. (internally slotmem could use r-connection-id). -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: svn commit: r423444 - in /httpd/httpd/branches/httpd-proxy-scoreboard/modules/mem: ./ Makefile.in config5.m4 mod_plainmem.c mod_scoreboard.c mod_sharedmem.c slotmem.h
Ruediger Pluem wrote: +static apr_status_t ap_slotmem_create(ap_slotmem_t **new, const char *name, apr_size_t item_size, int item_num, apr_pool_t *pool) In my thought of a slotmem or scoreboard item_num is max threads * max procs just like the normal scoreboard. Or has this morphed into something completely different? I can see uses for this type as well. Would be nice to have a function somewhere to get the current connections scoreboard slot perhaps... -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: Additing a storage for the shared information of the worker in mod_proxy
Jean-frederic Clere wrote: Using the result of your ideas and the explaintions now I have mod_proxy that uses the scoreboard via a scoreboard provider ;-) Find enclosed the code. Any comments? The next step is to write shared memory scoreboard provider. I was thinking the scoreboard code would be in something like mod_scoreboard that other modules (like mod_proxy) would use. apr_shm provider is rather trivial. Left as exercise to the reader ;) -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: Additing a storage for the shared information of the worker in
Jim Jagielski wrote: at is the actual scoreboard and what is the proxy's segment of scoreboard space... Eg the ap_scoreboard_* stuff implies that it's Apache's real scoreboard. Like I said, I think this scoreboard stuff should be more generic than just proxy. It could be renamed to avoid confusion with the real scoreboard. But, there is no reason the real scoreboard couldn't use mod_scoreboard itself. It would just have to be core. probably best to have mod_scoreboard now, and merge the two later. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: Additing a storage for the shared information of the worker in mod_proxy
Jim Jagielski wrote: If this is data that needs to be accessed from non-proxy modules then yes, I agree. A basic API could look like. By worker, I am thinking about the mpm sense, not the proxy sense. I guess slot may be a better term: /*used for ap_scoreboard_do. mem is the memory associated with a worker, data is what is passed to ap_scoreboard_do. pool is pool used to create scoreboard*/ typedef apr_status_t ap_scoreboard_callback_fn_t(void* mem, void *data, apr_pool_t *pool); /*call the callback on all worker slots*/ AP_DECLARE(apr_status_t)ap_scoreboard_do(ap_scoreboard_t *s, ap_scoreboard_callback_fn_t *func, void *data, apr_pool_t *pool); /*create a new scoreboard with each item size is item_size. name is a key used for debuggin and in mod_status output. This would create shared memory, basically*/ AP_DECLARE(apr_status_t) ap_scoreboard_create(ap_scoreboard_t **new, const char *name, apr_size_t item_size, apr_pool_t *pool); /*get the memory associated with this worker slot. use c-id or c-sbh to get offset into shared memory*/ AP_DECLARE(apr_status_t) ap_scoreboard_mem(ap_scoreboard_t *s, conn_rec *c, void**mem); Thoughts. Somthing very similar to this is used by several very busy web sites... -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: Additing a storage for the shared information of the worker in
Jim Jagielski wrote: +1. For example, a memcached based scoreboard would be pretty cool ;) maybe in mod_scoreboard it may use a provider mechanism to actually implement the scoreboard. Maybe have an ap_scoreboard_create_ex where you could explicitly name a provider. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: Additing a storage for the shared information of the worker in
Jim Jagielski wrote: Yeah, that's what I was thinking as well! default could just use apr_shm. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: Additing a storage for the shared information of the worker in mod_proxy
Jean-frederic Clere wrote: With such an interface you assume only one process will access to one slot... That is what the scoreboard allows. Allowing updates from different proccesses on the same slot. Should we have an ap_slot_read_look() and an ap_slot_unlock() for that? No. we don't have such for the built-in scoreboard. Anything can read the scoreboard, only current worker slot can change it. that's why in the sample API, to get the memory you pass a conn_rec. If it's slow, people won't use it. Semaphores are generally slow. Enforcing it by convention like we currently do seems reasonable. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: Additing a storage for the shared information of the worker in mod_proxy
Jim Jagielski wrote: Having some external (or even internal) process update a slot that isn't its own is dangerous. And the required locking would be slow. In my own hacked proxy, an external healthchecker and the proxy share a piece of shared memory that is read-only by apache and read-write by the external health check. This is not scoreboard info, just some health info. The 2 are separate things. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: Additing a storage for the shared information of the worker in mod_proxy
Jean-frederic Clere wrote: Hi, I am still trying to replace the scoreboard by shared memory to store the shared information of the workers, I am now thinking to get this by adding modules like the prototype I have enclosed (that is a patch against trunk). We really need a generic scoreboard type module with a relatively simple API to add and access per worker memory. I have some ideas if anyone wants to hear them... -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: [PATCH] setenvif filter
Anyone else care to vote on this so it can get, possibly, committed? Francois Pesce wrote: These patches may fix the r-content_type behaviour. Are you OK with it ? -- *Francois Pesce* 2006/5/31, Brian Akins [EMAIL PROTECTED]: Francois PESCE wrote: I've discussed about a patch for mod_setenvif 2 years ago, and have coded it at that time, it is successfully used on various host in production since. You need to handle content type specially by checking r-content_type. For some reason, just doing apr_table_get(r-headers_out, Content-type) would be null, but content_type would be set. See the patch I posted a few days ago. +1 in concept -- Brian Akins Lead Systems Engineer CNN Internet Technologies -- Brian Akins Lead Systems Engineer CNN Internet Technologies
Re: hook call tracing [was: debug apache]
William A. Rowe, Jr. wrote: This is definately useful for module developers, and should be added. +1. it is also useful, when you have strange things happening to production web servers. I, too, did a rather ugly hack that shouwed the current hook using mod_status hook and it has come in handy. It is however very ugly. -- Brian Akins Lead Systems Engineer CNN Internet Technologies
Re: [PATCH] setenvif filter
Francois Pesce wrote: These patches may fix the r-content_type behaviour. Are you OK with it ? +1 -- Brian Akins Lead Systems Engineer CNN Internet Technologies
[PATCH] add notfound to mod_rewrite rule
patch against 2.2.2 (but should work on most other versions). Adds the notfound (NF) flag to rewriterule that will case it to return HTTP_NOT_FOUND for matched, similar to the gone and forbidden options. --- mod_rewrite.c.bak 2006-06-01 15:25:48.0 -0400 +++ mod_rewrite.c 2006-06-01 15:29:18.0 -0400 @@ -3264,6 +3264,11 @@ || !strcasecmp(key, ocase)) {/* nocase */ cfg-flags |= RULEFLAG_NOCASE; } +else if (((*key == 'F' || *key == 'f') !key[1]) + || !strcasecmp(key, otfound)) { /* notfound */ +cfg-flags |= (RULEFLAG_STATUS | RULEFLAG_NOSUB); +cfg-forced_responsecode = HTTP_NOT_FOUND; +} else { ++error; } -- Brian Akins Lead Systems Engineer CNN Internet Technologies
Re: [PATCH] add notfound to mod_rewrite rule
André Malo wrote: We don't need this, because you can use [R=404]. F and G are already just syntactic sugar. From the docs: 'redirect|R [=code]' (force redirect) Prefix Substitution with http://thishost[:thisport]/ (which makes the new URL a URI) to force a external redirection. If no code is given, a HTTP response of 302 (MOVED TEMPORARILY) will be returned. If you want to use other response codes in the range 300-400, simply specify the appropriate number or use one of the following symbolic names: temp (default), permanent, seeother. Use this for rules to canonicalize the URL and return it to the client - to translate ``/~'' into ``/u/'', or to always append a slash to /u/user, etc. So do the docs need to be updated to say: return arbitrary http code? BTW, this does work. patch withdrawn. somebody needs to fix docs... -- Brian Akins Lead Systems Engineer CNN Internet Technologies
Re: [PATCH] setenvif filter
Francois PESCE wrote: I've discussed about a patch for mod_setenvif 2 years ago, and have coded it at that time, it is successfully used on various host in production since. You need to handle content type specially by checking r-content_type. For some reason, just doing apr_table_get(r-headers_out, Content-type) would be null, but content_type would be set. See the patch I posted a few days ago. +1 in concept -- Brian Akins Lead Systems Engineer CNN Internet Technologies
Re: mod_disk_cache read-while-caching patch (try2)
Niklas Edmundsson wrote: In short, the mod_disk_cache read-while-caching patch is usable now and I'd like to contribute what's possible upstreams. Does it make more sens at this time to do all these changes in another module and leave mod_disk_cache as stable and useable? call it disk2 or something... CacheEnable disk2 / -- Brian Akins Lead Systems Engineer CNN Internet Technologies
Re: [PATCH] mod_disk_cache early size-check
Niklas Edmundsson wrote: This patch takes advantage of the possibility to do the size-check of the file to be cached early. obj-vobj = dobj = apr_pcalloc(r-pool, sizeof(*dobj)); Shouldn't this be in mod_cache so that all providers do not have to duplicate this logic? -- Brian Akins Lead Systems Engineer CNN Internet Technologies
[PATCH] setenvif filter
This patch add a filter to mod_setenvif that lets it match against response headers as well as request headers. It is probably a horrible implementation, but submitted to encourage others to think of the idea. This changes the configuration to allow another optional field to designate the mode of the match (default is request) SetEnvIf response Content-Type text/* is_text=1 will match against the response header content-type. The main purpose of this is to allow configurations such as: AddOutputFilterByType DEFLATE text/html BrowserMatch ^Mozilla/4\.0[678] no-gzip SetEnvIf response Content-Type text/html user-agent-vary=1 Header append Vary User-Agent env=user-agent-vary With this patch, the correct vary headers are added in a reverse proxy situation. most of the code was adapted from mod_headers. Thoughts? -- Brian Akins Lead Systems Engineer CNN Internet Technologies --- mod_setenvif.c.bak 2006-05-23 10:08:56.0 -0400 +++ mod_setenvif.c 2006-05-23 11:03:06.0 -0400 @@ -94,6 +94,8 @@ #include http_log.h #include http_protocol.h +#define SETENVIF_REQUEST 1 +#define SETENVIF_RESPONSE 2 enum special { SPECIAL_NOT, @@ -113,12 +115,15 @@ apr_table_t *features; /* env vars to set (or unset) */ enum special special_type; /* is it a special header ? */ int icase; /* ignoring case? */ +int mode; /*request or response*/ } sei_entry; typedef struct { apr_array_header_t *conditionals; } sei_cfg_rec; +static ap_filter_rec_t *setenvif_output_filter_handle = NULL; + module AP_MODULE_DECLARE_DATA setenvif_module; /* @@ -249,7 +254,7 @@ } static const char *add_setenvif_core(cmd_parms *cmd, void *mconfig, - char *fname, const char *args) + char *fname, int mode, const char *args) { char *regex; const char *simple_pattern; @@ -304,6 +309,7 @@ /* no match, create a new entry */ new = apr_array_push(sconf-conditionals); +new-mode = mode; new-name = fname; new-regex = regex; new-icase = icase; @@ -400,15 +406,35 @@ static const char *add_setenvif(cmd_parms *cmd, void *mconfig, const char *args) { -char *fname; - +char *fname = NULL; +int mode = SETENVIF_REQUEST; + /* get header name */ fname = ap_getword_conf(cmd-pool, args); -if (!*fname) { +/*is this a mode?*/ + +if (!fname) { +return apr_pstrcat(cmd-pool, Missing header-field name for , + cmd-cmd-name, NULL); +} + +if(!strcasecmp(fname, request)) { +mode = SETENVIF_REQUEST; +fname = NULL; +} else if (!strcasecmp(fname, response)) { +mode = SETENVIF_RESPONSE; +fname = NULL; +} + +if(!fname) { +fname = ap_getword_conf(cmd-pool, args); +} + +if (!fname) { return apr_pstrcat(cmd-pool, Missing header-field name for , cmd-cmd-name, NULL); } -return add_setenvif_core(cmd, mconfig, fname, args); +return add_setenvif_core(cmd, mconfig, fname, mode, args); } /* @@ -418,7 +444,7 @@ */ static const char *add_browser(cmd_parms *cmd, void *mconfig, const char *args) { -return add_setenvif_core(cmd, mconfig, User-Agent, args); +return add_setenvif_core(cmd, mconfig, User-Agent, SETENVIF_REQUEST, args); } static const command_rec setenvif_module_cmds[] = @@ -444,7 +470,7 @@ * signal which call it is by having the earlier one pass a flag to the * later one. */ -static int match_headers(request_rec *r) +static int match_headers(request_rec *r, int mode) { sei_cfg_rec *sconf; sei_entry *entries; @@ -454,7 +480,14 @@ int i, j; char *last_name; ap_regmatch_t regm[AP_MAX_REG_MATCH]; - +apr_table_t *headers; + +if(SETENVIF_RESPONSE == mode) { +headers = r-headers_out; +} else { +headers = r-headers_in; +} + if (!ap_get_module_config(r-request_config, setenvif_module)) { ap_set_module_config(r-request_config, setenvif_module, SEI_MAGIC_HEIRLOOM); @@ -468,9 +501,17 @@ entries = (sei_entry *) sconf-conditionals-elts; last_name = NULL; val = NULL; + for (i = 0; i sconf-conditionals-nelts; ++i) { sei_entry *b = entries[i]; - + +if(b-mode != mode) { +continue; +} + +ap_log_error(APLOG_MARK, APLOG_DEBUG, 0, r-server, + setenvif: trying %s, b-name); + /* Optimize the case where a bunch of directives in a row use the * same header. Remember we don't need to strcmp the two header * names because we made sure the pointers were equal during @@ -505,7 +546,7 @@ * headers
Re: [PATCH] setenvif filter
Here's a newer version with some special handling for content-type. In response headers, we need to match r-content_type rather than the header. -- Brian Akins Lead Systems Engineer CNN Internet Technologies --- mod_setenvif.c.bak 2006-05-23 10:08:56.0 -0400 +++ mod_setenvif.c 2006-05-23 16:55:50.0 -0400 @@ -94,6 +94,8 @@ #include http_log.h #include http_protocol.h +#define SETENVIF_REQUEST 1 +#define SETENVIF_RESPONSE 2 enum special { SPECIAL_NOT, @@ -102,7 +104,8 @@ SPECIAL_REQUEST_URI, SPECIAL_REQUEST_METHOD, SPECIAL_REQUEST_PROTOCOL, -SPECIAL_SERVER_ADDR +SPECIAL_SERVER_ADDR, +SPECIAL_CONTENT_TYPE }; typedef struct { char *name; /* header name */ @@ -113,12 +116,15 @@ apr_table_t *features; /* env vars to set (or unset) */ enum special special_type; /* is it a special header ? */ int icase; /* ignoring case? */ +int mode; /*request or response*/ } sei_entry; typedef struct { apr_array_header_t *conditionals; } sei_cfg_rec; +static ap_filter_rec_t *setenvif_output_filter_handle = NULL; + module AP_MODULE_DECLARE_DATA setenvif_module; /* @@ -249,7 +255,7 @@ } static const char *add_setenvif_core(cmd_parms *cmd, void *mconfig, - char *fname, const char *args) + char *fname, int mode, const char *args) { char *regex; const char *simple_pattern; @@ -304,6 +310,7 @@ /* no match, create a new entry */ new = apr_array_push(sconf-conditionals); +new-mode = mode; new-name = fname; new-regex = regex; new-icase = icase; @@ -345,6 +352,9 @@ else if (!strcasecmp(fname, server_addr)) { new-special_type = SPECIAL_SERVER_ADDR; } +else if ((SETENVIF_RESPONSE == mode) !strcasecmp(fname, content-type)) { +new-special_type = SPECIAL_CONTENT_TYPE; +} else { new-special_type = SPECIAL_NOT; /* Handle fname as a regular expression. @@ -400,15 +410,35 @@ static const char *add_setenvif(cmd_parms *cmd, void *mconfig, const char *args) { -char *fname; - +char *fname = NULL; +int mode = SETENVIF_REQUEST; + /* get header name */ fname = ap_getword_conf(cmd-pool, args); -if (!*fname) { +/*is this a mode?*/ + +if (!fname) { +return apr_pstrcat(cmd-pool, Missing header-field name for , + cmd-cmd-name, NULL); +} + +if(!strcasecmp(fname, request)) { +mode = SETENVIF_REQUEST; +fname = NULL; +} else if (!strcasecmp(fname, response)) { +mode = SETENVIF_RESPONSE; +fname = NULL; +} + +if(!fname) { +fname = ap_getword_conf(cmd-pool, args); +} + +if (!fname) { return apr_pstrcat(cmd-pool, Missing header-field name for , cmd-cmd-name, NULL); } -return add_setenvif_core(cmd, mconfig, fname, args); +return add_setenvif_core(cmd, mconfig, fname, mode, args); } /* @@ -418,7 +448,7 @@ */ static const char *add_browser(cmd_parms *cmd, void *mconfig, const char *args) { -return add_setenvif_core(cmd, mconfig, User-Agent, args); +return add_setenvif_core(cmd, mconfig, User-Agent, SETENVIF_REQUEST, args); } static const command_rec setenvif_module_cmds[] = @@ -444,7 +474,7 @@ * signal which call it is by having the earlier one pass a flag to the * later one. */ -static int match_headers(request_rec *r) +static int match_headers(request_rec *r, int mode) { sei_cfg_rec *sconf; sei_entry *entries; @@ -454,7 +484,14 @@ int i, j; char *last_name; ap_regmatch_t regm[AP_MAX_REG_MATCH]; - +apr_table_t *headers; + +if(SETENVIF_RESPONSE == mode) { +headers = r-headers_out; +} else { +headers = r-headers_in; +} + if (!ap_get_module_config(r-request_config, setenvif_module)) { ap_set_module_config(r-request_config, setenvif_module, SEI_MAGIC_HEIRLOOM); @@ -468,9 +505,17 @@ entries = (sei_entry *) sconf-conditionals-elts; last_name = NULL; val = NULL; + for (i = 0; i sconf-conditionals-nelts; ++i) { sei_entry *b = entries[i]; - + +if(b-mode != mode) { +continue; +} + +ap_log_error(APLOG_MARK, APLOG_DEBUG, 0, r-server, + setenvif: trying %s, b-name); + /* Optimize the case where a bunch of directives in a row use the * same header. Remember we don't need to strcmp the two header * names because we made sure the pointers were equal during @@ -498,6 +543,10 @@ case SPECIAL_REQUEST_PROTOCOL: val = r-protocol; break
bug in ap_set_content_type or somewhere
If I have this configured: AddOutputFilterByType DEFLATE text/html but get something like host/server-status?auto, it still gets gzipped. It looks like because in mod_status.c we do this: ap_set_content_type(r, text/html); and a little later: case STAT_OPT_AUTO: ap_set_content_type(r, text/plain); and in ap_set_content_type we call ap_add_output_filters_by_type(r); so, even though we change the content type, we still have added DEFLATE for this. -- Brian Akins Lead Systems Engineer CNN Internet Technologies
mod_mime, filters and proxy
Any reason why we do this check alot in mod_mime.c: r-proxyreq == PROXYREQ_NONE why can't we add by type to proxy requests? This is at top: /* X - fix me - See note with NOT_PROXY */ but I don't see a note. -- Brian Akins Lead Systems Engineer CNN Internet Technologies
move ap_sb_handle_t def
Any objections to these two small diffs? Basically, it just moves the definition of ap_sb_handle_t from being private in scoreboard.c to being available to everyone in scoreboard.h. This allows modules to get a scoreboard handle for themself using r-connection-sbh. Thoughts? --- include/scoreboard.h.bak2006-05-15 07:44:02.0 -0400 +++ include/scoreboard.h2006-05-15 07:44:47.0 -0400 @@ -169,7 +169,10 @@ lb_score *balancers; } scoreboard; -typedef struct ap_sb_handle_t ap_sb_handle_t; +typedef struct { +int child_num; +int thread_num; +} ap_sb_handle_t; AP_DECLARE(int) ap_exists_scoreboard_image(void); AP_DECLARE(void) ap_increment_counts(ap_sb_handle_t *sbh, request_rec *r); --- server/scoreboard.c.bak 2006-05-15 07:42:39.0 -0400 +++ server/scoreboard.c 2006-05-15 07:43:53.0 -0400 @@ -63,11 +63,6 @@ static APR_OPTIONAL_FN_TYPE(ap_proxy_lb_workers) *proxy_lb_workers; -struct ap_sb_handle_t { -int child_num; -int thread_num; -}; - static int server_limit, thread_limit, lb_limit; static apr_size_t scoreboard_size; -- Brian Akins Lead Systems Engineer CNN Internet Technologies
Re: Possible new cache architecture
Graham Leggett wrote: I think in the long run, a dedicated process is the way to go. I think using a provider architecture would be best and keep complexity out of mod_cache. Some module(s) would implement the necessary cache management functions and mod_cache would push/pull/probe the manager using this interface. The manager may or may not be tied to the storage provider. We may have enough generic interfaces already to allow completely stand alone cache managers. At least, that's how I would do it... -- Brian Akins Lead Systems Engineer CNN Internet Technologies
Re: Possible new cache architecture
Graham Leggett wrote: Moving towards and keeping with the above goals is a far higher priority than simplifying the generic backend cache interface. This response was a perfect summation of why we do *not* run the stock mod_cache here... -- Brian Akins Lead Systems Engineer CNN Internet Technologies
Re: Possible new cache architecture
Graham Leggett wrote: Seriously, please move this off list to keep the noise out of people's inboxes. Does this discussion belong off-list? I would think this is the type of thing we need to discuss on this list. Is there any consensus as to how to move forward? Do we just leave it as it is currently? -- Brian Akins Lead Systems Engineer CNN Internet Technologies
Re: Possible new cache architecture
Roy T. Fielding wrote: For the record, Graham's statements were entirely correct, Brian's suggested architecture would slow the HTTP cache, No. It would simplify the existing implementation. The existing implementation, as Graham has noted, is not fully functional. Graham argues - and I'm still mulling it over - that a generic cache architecture would get in the way of making a fully functional http cache. -- Brian Akins Lead Systems Engineer CNN Internet Technologies
RFC: rename mod_cache to mod_http_cache
Not wanting to stir the huge pot o' stuff that is going on here, but what are the thoughts of renaming mod_cache to mod_http_cache? mod_cache is http specific. This would follow the general ide that mod_proxy uses. I am not suggesting changing any functionality at this time, simply renaming it to a more suitable name. -- Brian Akins Lead Systems Engineer CNN Internet Technologies
Re: RFC: rename mod_cache to mod_http_cache
William A. Rowe, Jr. wrote: Not in 2.2 branch, but in trunk? The issue is that it's half httpd, and half generic. Let me mull this over. can we separate out the http specific parts without violating Graham's concerns? My whole original idea was to just do that... I was not fully aware of the issues in the current mod_cache. -- Brian Akins Lead Systems Engineer CNN Internet Technologies
Generic cache architecture
Is anyone else interested in having a generic cache architecture? (not http). I have plenty of cases were I re-invent the wheel for caching various things (IP's, sessions, whatever, etc.). It would be nice to have a provider based architecture for such things. -- Brian Akins Lead Systems Engineer CNN Internet Technologies
Re: Generic cache architecture
Gonzalo Arana wrote: I am. How about adding it to apr? How about someone figuring out how to get providers into apr? Doesn't look horribly hard. Perhaps I should ask on apr-devel? -- Brian Akins Lead Systems Engineer CNN Internet Technologies
Re: Possible new cache architecture
Roy T. Fielding wrote: That is a heck of a lot easier than convincing everyone to dump the current code based on an untested theory. I think the idea may be a lot more tested than you think. Most things I suggest have had an incubation period somewhere... I'm fine with not screwing with current mod_cache. I just think it should be either: renamed or made generic. We may or may not need a generic mod_backend_cache. I have posted a psuedo-implementation that got lost in the latest thread bloat. I can repost if anyone is interested. -- Brian Akins Lead Systems Engineer CNN Internet Technologies
Re: Generic cache architecture
Roy T. Fielding wrote: provide this functionality once, and reuse On the contrary, it makes no sense whatsoever to use a generic storage facility for cached HTTP responses in a front-end cache because those responses can only be delivered at maximum speed through a single system call IFF they are not generic. That is why our front-end cache is not, and has never needed to be, a generic cache. a generic cache can deliver objects in a single system call. Thinks VFS. the generic storage facility may be only a thin wrapper around something like current mod_disk_cache or it may be a memcache frontend, or something completely different. Trust me, I am extremely concerned about performance. -- Brian Akins Lead Systems Engineer CNN Internet Technologies
Re: Possible new cache architecture
Graham Leggett wrote: - the cache says cool, will send my copy upstream. Oops, where has my data gone?. So, the cache says, okay must get content the old fashioned way (proxy, filesystem, magic fairies, etc.). Where's the issue? -- Brian Akins Lead Systems Engineer CNN Internet Technologies
Re: Possible new cache architecture
Graham Leggett wrote: The way HTTP caching works is a lot more complex than in your example, you haven't taken into account conditional HTTP requests. ... Still not sure how this is different from what we are proposing. we really want to separate protocol from cache stuff. If we have a revalidate for the generic cache it should address all your concerns. ??? -- Brian Akins Lead Systems Engineer CNN Internet Technologies
Re: Possible new cache architecture
Graham Leggett wrote: To be HTTP compliant, and to solve thundering herd, we need the following from a cache: This seems more like a wish list. I just want to separate out the cache and protocol stuff. - The ability to amend a subkey (the headers) on an entry that is already cached. mod_http_cache should handle. to new mod_cache, it's just another key/value. - The ability to invalidate a particular cached variant (ie headers + data) in one atomic step, without affecting threads that hold that cached entry open at the time. mod_http_cache should handle. Keep a list of variants cached - this should use a provider interface as well. mod_cache would handle whatever locking, ref counting, etc, needs to be done, if any. - The ability to read from a cached object that is still being written to. Nice to have. out of scope for what I am proposing. new mod_cache should be the place to implement this if underlying provider supports it. - A guarantee that the result of a broken write (segfault, timeout, connection reset by peer, whatever) will not result in a broken cached entry (ie that the cached entry will eventually be invalidated, and all threads trying to read from it will eventually get an error). agreed. new mod_cache should handle this. Certainly separate the protocol from the physical cache, just make sure the physical cache delivers the shopping list above :) Most seem like protocol specific stuff. -- Brian Akins Lead Systems Engineer CNN Internet Technologies