Re: Some changes in memcached to efficiently process multi-get queries

Sunil Patil Thu, 27 Jun 2013 05:18:32 -0700

These changes will be available at 
https://github.com/sunillp/sheep-memcached


A README file and detailed design document is currently available.

On Saturday, 18 May 2013 15:24:36 UTC+5:30, Sunil Patil wrote:
>
> Hi,
>
> We have made some changes in memcached for doing "Data filtering at 
> server". We would like to open source this and contribute to memcached. We 
> can provide you the patch for review. We have developed some tests (which 
> people could try out) that show benefits of this i.e. "data filtering at 
> server".
>
> Please let me know your thoughts.
>
> Thanks,
> Sunil
>
> About Changes:
> - With these changes we can do "data filtering at server". This is good 
> for multi-get queries ex: queries issued in social networking applications 
> where "data related to all friends of a user is read, processed, and 
> filtered data is returned to user. Filtered data is often a very small 
> subset of actual data that was read".
> - On a side note not related to memcached server, we also plan to 
> implement data colocation on memcached client (all friends data will be 
> stored on single (or very few) server), so that very few servers are 
> contacted during query processing. This would further compliment data 
> filtering.
>
> Changes:
> 1. Added two new options to memcached server (-x and –y):
> # ./memcached –h
> …
> -x <num> -y <filter library path>
>               Enable data filtering at server - helps in multi-get 
> operations
>               <num> = 1 - Data filtering at server enable (no deserialized 
> data)
>                           Data deserialized at the time of query processing
>               <num> = 2 - Data filtering at server enable (with 
> deserialized data)
>                           Uses more memory but gives better performance. 
> Avoids data
>                           deserialization at the time of query processing 
> and
>                           saves CPU cycles
>               <filter library path> - path of filter library 'libfilter.so'
>                           This library implements filtering functions and 
> data
>                           serialization/deserialization functions
>
> 2. On enabling filtering, on "get" query we read data of all keys and pass 
> this data to a filtering function implemented in user provided library 
> "libfilter.so". "dlopen", "dlsym" framework is used for opening user 
> provided library and calling user provided functions. User has to define 
> only three functions, "deserialize()", "free_msg()" and "readfilter()". We 
> plan to introduce a new command "fget" (filter get) for this functionality 
> wherein client could additionally pass arguments to filter function and 
> could have multiple filtering functions (i.e. can have (work with) multiple 
> filter libraries).
>
> Currently changes are implemented for linux platform (tested on linux 
> version RHEL 5.6). Changes made on memcached version "memcached-1.4.13". 
> Changes made for ascii protocol (not for binary protocol), no impact on 
> "gets" (get with CAS) functionality.
>
> Performance enhancement:
> Some of the advantages of this are (for multi-get queries with 
> characteristics mentioned above):
> - Better throughput and latency under normal query-load conditions => can 
> result in client consolidation
> - Since most data is filtered at server, very less data traffic flows over 
> network (from server to client). This avoids network congestion (and hence 
> latencies/delays caused by this) which might happen under high query-load 
> with normal memcached.
>
> - Performance with these changes (for multi-get queries with 
> characteristics mentioned above) is 3x to 7x times better than normal 
> memcached as shown below.
>
> Tests performed:
> - Setup details:
> 1 memcached server, RHEL 6.1, 64 bit, 16 core, 24 GB RAM, 1 Gb ethernet 
> card
> 1 memcached client, RHEL 6.1, 64 bit, 16 core, 24 GB RAM, 1 Gb ethernet 
> card
> - Test details:
> There are one million users (each user represented by a unique key). Each 
> user has 100 friends. Each user has 30 records of type (userId, articleId, 
> timestamp) stored as value. On READ query for a user, all records 
> associated with all friends of that user are READ, sorted in increasing 
> order of timestamp, and top/latest 10 records across all friends are 
> returned as output. So basically on READ query 100 keys (100*30=3000 
> records) are read, 3000 records are sorted and top 10 records are returned 
> as output.
>
> - For normal memcached all these operations of READING 100 keys, sorting 
> 3000 records, and finding top 10 records are done on client.
> - With our changes (where filtering (sorting) happens on server), on 
> server 100 keys are read, 3000 records are sorted locally by filtering 
> function (implemented in user provided library – similar processing is done 
> on server as it is done on client), and only 10 records are sent to the 
> client.
>
> Created a multithreaded CLIENT application which issues READ queries 
> asynchronously (multiple threads are used for issuing and processing READ 
> queries). READ queries are issued for varying number of users starting from 
> 1 user to 30000 users. Time taken to complete these queries is used to 
> compute throughput and latency. See the attachments for perf. results.
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Re: Some changes in memcached to efficiently process multi-get queries

Reply via email to