G'day guys Lately Dormando and I have been discussing a bit on how to make the backend of memcached pluggable so that external engines can be used.
Rather than rushing ahead and trying to hack up a patch, I've whipped up a very informal draft to set the starting point of this work based on my initial thoughts and dormando's feedback. I figured its best to nail down the specification, by getting as much feedback from the memcached community as possible. So it would be great if those that are interested in this feature could feedback potential improvements that can be made to this starting point. Afterall, this draft is still incomplete so there should be quite a lot of things that can be pointed out. This includes things like fetching stats and whether or not to support engine specific commands (and much more). The draft is attached to this email (memcached-backend-draft-20080305.txt) Cheers, Toru Maesaka
Proposal and thoughts on making the backend of memcached modular March 5th 2008 AIM ------------------------ In brief, the aim of this work is to allow memcached to be used as both a high speed memory object caching engine and a dynamic network platform. WHAT NEEDS TO BE DONE ------------------------ The current backend related code needs to be tucked behind a common interface such that builtin (slabber related) functions must be moved behind a wrapper. Since a memcached module does not necessarily have to be a storage engine (e.g. can be a task-queue), the term "container" will be used to describe the storage layer in this draft. It is desirable to write the wrapper without having to change the existing code too much __AND__ achieve versatile design at the same time. The potential difficulty here is that to do its job, the library functions will most likely require a pointer to the block of memory for its internal data. This means that though it is favorable to not change the signatures of the existing functions, a void pointer might have to be added even though it will probably not be used by the slabber. There are three solutions to this problem. (1) Use a static struct (no need to change the signatures). (2) Tell the compiler to expect variable number of arguments. (3) Add a meaningless void pointer to the signatures of the builtin functions. My proposed solution is to use (1). memcached will know whether to link to the slabber functions or dynamically link from the external library by seeing whether the library path is supplied or not at startup. CONTAINER STRUCTURE ------------------------ Instead of directly calling "do_*_item" and "mt_*_item" to perform storage related operations, a new structure will be introduced to abstract this away. This is the proposed structure for solution (1) from above. Notice that there is no explicit functions for incr/decr and append/prepend operations. typedef struct { container *(*create_instance)(struct settings *settings, int version, int *error); void (*destroy_instance)(container *cont); void *(*item_get)(const void *key, const int nkey); int (*item_put)(const void *key, const int nkey, const void *val, const int nval); int (*item_del)(const void *key, const int nkey); } container; As seen above, most operations on the storge layer will come from the above structure. Update related operations, such as incr/decr, append/prepend are not included since they are performed by the existing wrapper functions inside memcached ("process_arithmetic_command", and "process_update_command"). INITIALIZATION ------------------------ The function, void slabs_init(const size_t limit, const double factor) will be put behind the new function, bool container_init(const char *modulepath, const char *params) where modulepath is the path to the shared library (memcached module) that is to be used instead of slabs. If modulepath is NULL then memcached will use its original slabs implementation. MEMCACHED MODULE ------------------------ Any given memcached module code must be able to be compiled into both a static library (e.g. *.a) and a shared library (e.g. *.so). Every module must supply the functions with the same signature and return types as the function pointers in the container structure described in the "CONTAINER STRUCTURE" section. MODULE LOADING METHOD ------------------------ External engines can be compiled statically with memcached __OR__ linked dynamically. Using MySQL as an example, there can be two methods of loading a external engine: "-E enginename" ...or... "-E /path/to/engine" where the short path would test for a statically linked engine, or a dynamic engine in a standard path. PARAMETER STRUCTURE ------------------------ Since external engines usually have its own specific parameter(s), Tim's idea of taking up the DSN compatible interface can be effective. Each engine will have a unique `tag' name and options private to an engine would have names that start with the tag. For example, starting up memcached with slabber can be done like this: memcached -E slab -O "slab_maxsize=2000M;slab_ratio=1.2;..." STATISTICS ------------------------ External engines will most likely have engine specific stats information about itself. Therefore a flexible method to retrieve stats from the engine is required. At this point I haven't come up with a promising solution for this but it is possible to extend the current STATS specification. By meaning extend, the STATS command would return the combined result of the server specific stats and storage engine specific stats. The client would receive the format it already knows: STAT <name> <value>\r\n terminated by: END\r\n This approach could make things complicated, thus still requires a lot of thought. CONCERNS ------------------------ - Whether to support storage engine specific commands. -- If so, how? - How to support statistics. -- If so, how? - Operations like incr/decr append/prepend could slowly corrupt a queue. FEEDBACKS TO ------------------------ Toru Maesaka <[EMAIL PROTECTED]> Dormando <[EMAIL PROTECTED]>