Hi -- I wanted to get a little experience with the "socache" (small object cache) providers which Joe Orton recently refactored out of mod_ssl and so I've written a pair of modules, mod_shmap and mod_socache_zookeeper, which are available here:
http://people.apache.org/~chrisd/projects/shared_map/ The mod_shmap module allows HTTP clients to access any configured socache provider's storage using GET, PUT, and DELETE requests; the URI path is used as the ID key. This might be a useful way for clients without a native API to access these providers. My immediate purpose, though, was to write a module that served as a test harness for the various providers and an example of how to use them without making people slog through mod_ssl. The mod_socache_zookeeper module is an experimental socache provider that uses ZooKeeper as its data store.[1][2] ZooKeeper is distributed, highly reliable coordination service similar to Google's Chubby lock service.[3] Like Chubby, it appears to implement the Paxos consensus algorithm.[4][5][6] (ZooKeeper's documentation and code comments are a little sketchy in this regard, however.) A key caveat about mod_socache_zookeeper is that it simply ignores expiry times at the moment. An enhancement would be to have a background thread that periodically culled expired nodes. Here's an example configuration that maps all HTTP requests to ZooKeeper, except for those under /shm, which go to a shared-memory cyclic buffer cache: SharedMapProvider zookeeper zk1.example.com:7000 <Location /shm> SharedMapProvider shmcb /tmp/shm </Location> SetHandler shmap Based on the experience of writing these modules, I have a few thoughts and notes for discussion, in no particular order. I confess I still find "socache" sits oddly with me as a name, both because of its similarity to mod_so and .so files, and because I continue to doubt everyone will treat these providers as always implementing caches only. It's true that some providers will always impose data size limits, but that could be something the caller can interrogate and reject or require as necessary. It would also be valuable, I think, to disambiguate these providers from the different functionality of mod_cache and its related modules. So I'd again suggest modules/shmap (for "shared map") as a possible location and naming scheme. Another very minor naming issue is that AP_SOCACHE_FLAG_NOTMPSAFE reads as "no temp safe" on first glance; perhaps NOT_MP_SAFE or NOT_ATOMIC would be more readable? I ran into three particular inconsistencies when coding which I think could be addressed quickly: a) The store() call should take an apr_pool_t argument like retrieve() and delete() for temporary allocations. b) The delete() call should return an apr_status_t like the other two, since complex providers may fail here. c) All providers should always return APR_NOTFOUND from retrieve() and delete() when data is not found. Currently at least the shmcb provider returns APR_EGENERAL in this case which makes it impossible to distinguish the "not found" case from serious errors. Another minor problem is that many of the error messages in the providers betray their mod_ssl origins in their error messages, such as "SSLSessionCache: Failed to Create Server" and so forth. Following my instinct that some users may not care about the caching/expiry side of things, I think allowing expiry = 0 to mean "retain as long as possible" would be useful. The namespace and hints arguments to the init() call are somewhat underused and also rather specific to the existing providers. I briefly thought I might be able to pass reslist min/max values in the hints but the ap_socache_hints structure isn't the right place; instead they'd need to be packed into the single string argument passed to create(). At the moment the memcached provider just hard-codes these values; so does my ZookKeeper provider. I wonder if there's a way to open this up a little and make per-provider-instance configuration easier, but I don't have a specific idea here. In a related vein, I think a naive user is going to invoking create() and init() at the right time a little tricky. The create() calls usually create an instance structure and parse arguments, but don't otherwise initialize. In order for their messages to make it to the console at startup time, one needs to invoke create() in the check_config phase, not post_config, since by then stderr is redirected to the logs. In mod_ssl, create() is called with s->process->pool, which means that if a provider is later unloaded, the structures it allocates in create() remain around forever. Global mutexes are also created out of s->process->pool, and similarly remain around indefinitely. In mod_shmap I use pconf exclusively to try to iron out these issues. Meanwhile, init() should ideally be called only during the second and subsequent configuration pass, so you need to do some magic with userdata in s->process->pool to avoid the initial configuration pass. Here pconf is used by mod_ssl and mod_shmap, and that's good, except that it does introduce some potential (if unlikely) interactions between graceful restarts and shared-memory segments on some platforms. Specifically, if APR is using a named segment from shmget(), and ftok() returns a different key after a graceful restart, then new processes are attached to a new segment, while lingering processes from the previous generation write to the old segment. This is really a complexity with shared memory, I think, and should be addressed (if at all) within the provider; pconf is still the right pool to pass to init() generally, I believe. Finally, just as a note, I used some tricks from mod_dbd in mod_socache_zookeeper. In particular, rather than opening connections in init(), init() just creates a singly-linked list of instances, and a child_init hook then creates a reslist of connections to ZooKeeper in each child process. This makes destroy() a no-op, among other things. However, it does require working around the problem of avoiding both leaking resource structures and double-free segfaults on shutdown that mod_dbd used to have. This problem stems from the fact that resources' sub-pools are destroyed prior to the reslist's cleanup function being invoked, at which point the cleanup then invokes each resource's destructor.[7][8] OK, well, I hope someone gets some utility out of these things, and please email me with any bugs or suggestions. Chris. [1] http://zookeeper.sourceforge.net/ [2] http://zookeeper.wiki.sourceforge.net/ [3] http://labs.google.com/papers/chubby.html [4] http://labs.google.com/papers/paxos_made_live.html [5] http://research.microsoft.com/users/lamport/pubs/pubs.html#paxos-simple [6] http://en.wikipedia.org/wiki/Paxos_algorithm [7] http://mail-archives.apache.org/mod_mbox/apr-dev/200612.mbox/[EMAIL PROTECTED] [8] http://mail-archives.apache.org/mod_mbox/apr-dev/200609.mbox/[EMAIL PROTECTED] -- GPG Key ID: 366A375B GPG Key Fingerprint: 485E 5041 17E1 E2BB C263 E4DE C8E3 FA36 366A 375B