proposal on multi-storage stuff

Toru Maesaka Wed, 05 Mar 2008 01:30:44 -0800

G'day guys

Lately Dormando and I have been discussing a bit on how to make the
backend of memcached pluggable so that external engines can be used.


Rather than rushing ahead and trying to hack up a patch, I've whipped
up a very informal draft to set the starting point of this work based on
my initial thoughts and dormando's feedback. I figured its best to nail
down the specification, by getting as much feedback from the
memcached community as possible.

So it would be great if those that are interested in this feature could
feedback potential improvements that can be made to this starting point.

Afterall, this draft is still incomplete so there should be quite a lot of
things that can be pointed out. This includes things like fetching stats
and whether or not to support engine specific commands (and much more).

The draft is attached to this email
  (memcached-backend-draft-20080305.txt)

Cheers,
Toru Maesaka

Proposal and thoughts on making the backend of memcached modular
                                                           March 5th 2008

AIM
------------------------
In brief, the aim of this work is to allow memcached to be used as both 
a high speed memory object caching engine and a dynamic network platform.


WHAT NEEDS TO BE DONE
------------------------
The current backend related code needs to be tucked behind a common interface 
such that builtin (slabber related) functions must be moved behind a wrapper. 

Since a memcached module does not necessarily have to be a storage engine 
(e.g. can be a task-queue), the term "container" will be used to describe the 
storage layer in this draft.

It is desirable to write the wrapper without having to change the existing 
code too much __AND__ achieve versatile design at the same time. The potential 
difficulty here is that to do its job, the library functions will most likely 
require a pointer to the block of memory for its internal data.

This means that though it is favorable to not change the signatures of the 
existing functions, a void pointer might have to be added even though it will 
probably not be used by the slabber.

There are three solutions to this problem. 

(1) Use a static struct (no need to change the signatures).
(2) Tell the compiler to expect variable number of arguments.
(3) Add a meaningless void pointer to the signatures of the builtin functions.

My proposed solution is to use (1).

memcached will know whether to link to the slabber functions or dynamically
link from the external library by seeing whether the library path is supplied 
or not at startup.


CONTAINER STRUCTURE
------------------------
Instead of directly calling "do_*_item" and "mt_*_item" to perform storage 
related operations, a new structure will be introduced to abstract this away. 

This is the proposed structure for solution (1) from above. Notice that 
there is no explicit functions for incr/decr and append/prepend operations.

  typedef struct {
    container *(*create_instance)(struct settings *settings, int version, 
                int *error);
    void (*destroy_instance)(container *cont);

    void *(*item_get)(const void *key, const int nkey);
    int (*item_put)(const void *key, const int nkey, const void *val, 
                    const int nval);
    int (*item_del)(const void *key, const int nkey);
  } container;

As seen above, most operations on the storge layer will come from the above
structure. Update related operations, such as incr/decr, append/prepend are 
not included since they are performed by the existing wrapper functions 
inside memcached ("process_arithmetic_command", and "process_update_command").


INITIALIZATION
------------------------
The function, 

  void slabs_init(const size_t limit, const double factor) 

will be put behind the new function,

  bool container_init(const char *modulepath, const char *params)

where modulepath is the path to the shared library (memcached module) that is 
to be used instead of slabs. If modulepath is NULL then memcached will use its 
original slabs implementation. 


MEMCACHED MODULE
------------------------
Any given memcached module code must be able to be compiled into both a 
static library (e.g. *.a) and a shared library (e.g. *.so).

Every module must supply the functions with the same signature and return 
types as the function pointers in the container structure described in the 
"CONTAINER STRUCTURE" section.


MODULE LOADING METHOD
------------------------
External engines can be compiled statically with memcached __OR__ linked 
dynamically. Using MySQL as an example, there can be two methods of loading 
a external engine:

"-E enginename"
...or...
"-E /path/to/engine"

where the short path would test for a statically linked engine, or a 
dynamic engine in a standard path. 


PARAMETER STRUCTURE
------------------------
Since external engines usually have its own specific parameter(s), Tim's idea 
of taking up the DSN compatible interface can be effective. Each engine will 
have a unique `tag' name and options private to an engine would have names that 
start with the tag.

For example, starting up memcached with slabber can be done like this:

  memcached -E slab -O "slab_maxsize=2000M;slab_ratio=1.2;..."


STATISTICS
------------------------
External engines will most likely have engine specific stats information 
about itself. Therefore a flexible method to retrieve stats from the 
engine is required.

At this point I haven't come up with a promising solution for this but it 
is possible to extend the current STATS specification. By meaning extend, 
the STATS command would return the combined result of the server specific 
stats and storage engine specific stats.

The client would receive the format it already knows:

  STAT <name> <value>\r\n

terminated by:

  END\r\n

This approach could make things complicated, thus still requires a lot of 
thought.


CONCERNS
------------------------
 - Whether to support storage engine specific commands.
 -- If so, how?
 - How to support statistics.
 -- If so, how?
 - Operations like incr/decr append/prepend could slowly corrupt a queue.


FEEDBACKS TO
------------------------
Toru Maesaka <[EMAIL PROTECTED]>
Dormando <[EMAIL PROTECTED]>

proposal on multi-storage stuff

Reply via email to