These changes will be available at https://github.com/sunillp/sheep-memcached
A README file and detailed design document is currently available. On Saturday, 18 May 2013 15:24:36 UTC+5:30, Sunil Patil wrote: > > Hi, > > We have made some changes in memcached for doing "Data filtering at > server". We would like to open source this and contribute to memcached. We > can provide you the patch for review. We have developed some tests (which > people could try out) that show benefits of this i.e. "data filtering at > server". > > Please let me know your thoughts. > > Thanks, > Sunil > > About Changes: > - With these changes we can do "data filtering at server". This is good > for multi-get queries ex: queries issued in social networking applications > where "data related to all friends of a user is read, processed, and > filtered data is returned to user. Filtered data is often a very small > subset of actual data that was read". > - On a side note not related to memcached server, we also plan to > implement data colocation on memcached client (all friends data will be > stored on single (or very few) server), so that very few servers are > contacted during query processing. This would further compliment data > filtering. > > Changes: > 1. Added two new options to memcached server (-x and –y): > # ./memcached –h > … > -x <num> -y <filter library path> > Enable data filtering at server - helps in multi-get > operations > <num> = 1 - Data filtering at server enable (no deserialized > data) > Data deserialized at the time of query processing > <num> = 2 - Data filtering at server enable (with > deserialized data) > Uses more memory but gives better performance. > Avoids data > deserialization at the time of query processing > and > saves CPU cycles > <filter library path> - path of filter library 'libfilter.so' > This library implements filtering functions and > data > serialization/deserialization functions > > 2. On enabling filtering, on "get" query we read data of all keys and pass > this data to a filtering function implemented in user provided library > "libfilter.so". "dlopen", "dlsym" framework is used for opening user > provided library and calling user provided functions. User has to define > only three functions, "deserialize()", "free_msg()" and "readfilter()". We > plan to introduce a new command "fget" (filter get) for this functionality > wherein client could additionally pass arguments to filter function and > could have multiple filtering functions (i.e. can have (work with) multiple > filter libraries). > > Currently changes are implemented for linux platform (tested on linux > version RHEL 5.6). Changes made on memcached version "memcached-1.4.13". > Changes made for ascii protocol (not for binary protocol), no impact on > "gets" (get with CAS) functionality. > > Performance enhancement: > Some of the advantages of this are (for multi-get queries with > characteristics mentioned above): > - Better throughput and latency under normal query-load conditions => can > result in client consolidation > - Since most data is filtered at server, very less data traffic flows over > network (from server to client). This avoids network congestion (and hence > latencies/delays caused by this) which might happen under high query-load > with normal memcached. > > - Performance with these changes (for multi-get queries with > characteristics mentioned above) is 3x to 7x times better than normal > memcached as shown below. > > Tests performed: > - Setup details: > 1 memcached server, RHEL 6.1, 64 bit, 16 core, 24 GB RAM, 1 Gb ethernet > card > 1 memcached client, RHEL 6.1, 64 bit, 16 core, 24 GB RAM, 1 Gb ethernet > card > - Test details: > There are one million users (each user represented by a unique key). Each > user has 100 friends. Each user has 30 records of type (userId, articleId, > timestamp) stored as value. On READ query for a user, all records > associated with all friends of that user are READ, sorted in increasing > order of timestamp, and top/latest 10 records across all friends are > returned as output. So basically on READ query 100 keys (100*30=3000 > records) are read, 3000 records are sorted and top 10 records are returned > as output. > > - For normal memcached all these operations of READING 100 keys, sorting > 3000 records, and finding top 10 records are done on client. > - With our changes (where filtering (sorting) happens on server), on > server 100 keys are read, 3000 records are sorted locally by filtering > function (implemented in user provided library – similar processing is done > on server as it is done on client), and only 10 records are sent to the > client. > > Created a multithreaded CLIENT application which issues READ queries > asynchronously (multiple threads are used for issuing and processing READ > queries). READ queries are issued for varying number of users starting from > 1 user to 30000 users. Time taken to complete these queries is used to > compute throughput and latency. See the attachments for perf. results. > -- --- You received this message because you are subscribed to the Google Groups "memcached" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.
